Category Archives: SQL

Do Update Now;

Just a couple of minutes ago, I was looking through the wp_posts table and thought enough was enough. There’s a thread over at WordPress where users have been complaining about the bloat and where developers have shrug it off with ‘Space is cheap’. Or, ‘You can delete it easily’. Think of the queries, think of the queries!

First off, there’s a setting you should set in your wp-config file:

define('WP_POST_REVISIONS', false);

To check (and compare) how many revision entries your wp_posts table has:

select
count(*),
post_status,
post_type
from wp_posts group
by post_status, post_type;

You can delete the bloat by issuing:

delete from wp_posts where post_type = 'revision';

If you have access to SSH, you should check the main wordpress folder: since moving to version 2.4, I’ve seen files called ‘core’ in that directory. These are (literally) core dumps and they are huge. Obviously, WP crashes the server at times (I feel sorry for my host): so much for ‘light-weight’ or ‘state of the art’. Oh wait, I can hear the punch-line, what is it?

Oh yes: “Code is Poetry”.

Me count(*)

I just finished upgrading to WordPress 2.3: so, curious as any developer would be, I took a look in the WordPress database definitions and noticed that three new tables were added. All of them take care of categories and (the new) tagging system. This means, Wp_Cat is out and has been replaced with wp_terms: the actual distinction (i.e., which terms is a category and which one is a tag) is now made in the the wp_term_taxonomy table.

I have issues with that last table because it has a count column (to track the number of posts). This is the second table that has a count column (earlier): I mentioned before that version 2.0 introduced a comment_count in the wp_posts table. Why not make use of the regular aggregate functions (like the standard COUNT(*))? After all, these aggregates are generally highly optimized functions (written in C) for tables with the same primary key(s). Also, as a good database designer, during database design you should take the use of aggregate functions in account when setting up your tables structure.

Notice that count of posts for a tag/category can be done simply by:

select count(*),
p.term_taxonomy_id,
q.taxonomy,
t.name
from wp_term_relationships p,
wp_term_taxonomy q,
wp_terms t
where p.term_taxonomy_id
= q.term_taxonomy_id
and q.term_id = t.term_id
group by p.term_taxonomy_id,
q.taxonomy, t.name

Sure: a query on that same table with that count column may look as easy as ‘select count from wp_term_taxonomy’, but remember, it (always) takes an extra write to put a number in that table (via a regular UPDATE). If your transaction on that table ever failed, your fancy report will most likely report wrong numbers.

Minor-Finer

I added a minor project to the ‘Current Project’ section (Right here), a program in C# that generates classes based on table meta-data. It’s simple and it works: there are a couple of tricks how to collect metadata from ODBC datasources.

Currently, I used ‘hardcoded’ datatype conversion (SQL_xxx -> Dot.Net type): I was in a rush and decided to (conveniently) forget about using reflection. That said, I keep forgetting about my linefeeds [sorry], but then, your Visual Studio formats everything nicely out anyway.

This reminds me that I’ve downloaded ‘Orcas’ yesterday, Microsoft’s CTP of the new Visual Studio IDE. It comes with a visual ‘class designer’ that allows you to create classes from tables like the generator above does, but obviously (at this stage) that feature only supports the typical MS-like database interfaces. Not much of a help if you work in other environments. The only interesting items are the new C#/.Net language extensions, like LINQ and XAML. And yes, the Dev Team finally decided to add a Compile to Target feature, which allows you to compile and link your executables with other .Net Frameworks.

Meh.

You asked: How to find count between two dates mssql

Oh: that age old problem. It depends on what you’re looking for? Years, days? Milliseconds?

/* Days */
select DATEDIFF(D, getdate(), '10/11/06');
select DATEDIFF(Y, getdate(), '10/01/02');

You may need to be careful with using year calculations: obviously MS SQL does not take in account the actual number of months that elapsed during year calculations. So, if you do the datediff for the dates ’10/11/06′ and ‘/12/01/03′, the result is definitely not ‘3’ .

Calculating dates in PostgreSQL is actually easier, where you can use constructs like ‘date ‘2001-10-01′ – date ‘2001-09-28”.

As an aside: you may have struggled with getting the MSQL Developer Edition to run on your high powered laptop. The DE comes with the free install of any of Microsoft Express editions of C#, C++ or Basic. In my case, a couple of months ago, I decided to turn off the automatic ‘start’ options for both SQLExpress and SQL Server Browser. To test the above code, I decided to run the above services, but found out that activating them did not get me connected to anything. Well: not MS SQL.

  • Open to the SQL Server Configuration Manager
  • Go to SQL Server 2005 Network configuration
  • Click Protocols for SQLEXPRESS
  • Enable TCP/IP
  • Restart both (earlier) mentioned services.
  • Go to ODBC Datasources manager.
  • Create a new datasource, select SQL NATIVE CLIENT (yes, you heard it here first). You should see the SQLEXPRESS (or whatever your computer’s name is) in that Server combobox.
  • Use your Windows Logon user/password combination to complete the rest of the setup.

Yeah. Really.

Who slams who?

This article (“The Web is broken and it’s all your fault”) popped up on the Postgres-general list and then made it into Slashdot as ‘Postgres slammed by PHP creator’.

The Slashdot thread highlights some interesting PHP flaws and the fact that PHP is a bad language to learn programming in (‘Code is Poetry’), by many compared to Visual Basic, the other awful language misused by people who think that programming is designing screens. You haven’t heard about multithreading yet?

That said, it’s interesting to see the programmer of PHP duke it out with the Slashdot crowd: obviously he was misquoted in the earlier linked-to article.

I don’t think this is news to anybody that MySQL is quicker at connecting and issueing simple queries, and I am not sure why me showing some Callgrind profiles and stating that MySQL is particular good at these things is frontpage slashdot material. Slow day?

Exactly that: For small things, MySQL does good. Need scalability? Obviously, you would choose Postgres.

Databases

I updated the plug-in accordingly but want to issue a warning before you download it: I’m not sure which version of MySQL does or does not support ‘left outer joins’. I think versions > 3.23 should be able to do so. If not, you’re out of luck.

(I discussed outer joins here and here)

I am surprised (once again) about the database design in WordPress. Version 2.0 included ‘major changes’, like adding the comment count in the Posts table. If correctly designed (as in true ‘one-to-many’ relationship), you can just fish out the comment count using an outer join without any performance loss (after all, one-to-many relationships depend on identical keys in both tables, which are generally indexed anyways!). Here’s a reminder for people who start to look around the corner, or who are just dipping their toes in SQL: you do not treat a one-to-many relationship as a ‘one-to-one’ relationship. Trust me: you’ll miss out.

W09

After Comments In Progressmany moons of bickering and looking around, I decided to take up the gloves and do it myself. Currently I’m using a slightly altered script that ‘automatically’ folds and closes comments after 10 days. This generally works good and has stopped the flow of sapm directed at the comments. However, I’m lacking precise control, that is, I’d like to have some posts to keep their comments available for a longer time.

You’ll find some of these plugins in the WordPress repository, but most of them make no sense and work rather not really good1.

I also noticed something funny in queries of existing plugins where in the code roundabout attempts are made to make a selection first and then pass the resultset to an ‘update’ statement: hey rookies, the following statement is ‘legal’ SQL:

update wp_posts
set ping_status = 'closed'
where post_date < DATE_ADD( DATE_SUB(CURDATE(), INTERVAL 4 month), INTERVAL 1 DAY);

On the other hand, using subselects in MySQL should generally be avoided. There's no real consensus around which MySQL version supports subselects. 4.xx maybe. 3.23 maybe. Who knows!

On a related note, I see that there's a new WP version out.

1 That is an understatement.

Varchars kill the integers

Iknow several professional applications that use varchar fields to store typically invoice and order numbers, just because it makes it easier to fill this field with non-numerical characters (for example when issuing credit notes and that).

This is pretty irritating on the part of indexing and sorting results: after all, which comes first when using alphanumerical sorting: ‘10001’ or ‘200-1’? Most likely, you’ll lose a lot of speed too: sorting (and comparing) integers goes a lot faster than comparing sets of characters.

Instead of declaring these types of fields varchars, consider making them integer fields: after all, adding dashes (and other non-numerical characters) can be easily made part of a formatting mask in the UI of the application: heck, most graphical UI have masked edit field controls (Win 32/.Net/KDE/Gnome).

Additionally, on typical transactional databases like Postgres and Oracle, (integer) sequences are created atomically [transaction independent], which can save you a lot more headaches than trying to generate a unique combination of characters in a multi-user environment.

Postgres stuff #1

Technical stuff: I have finally some time to break into some specific Postgres stuff and do something for this database as many others have always done for MySQL. Nothing earth-shattering.

Here’s the first finding: my personal Postgres (debian) is a bit more advanced than the ones we use at work:


>select version();
PostgreSQL 7.4.2 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.3 (Debian 20040401)

Note: via the Postgres mailinglist: be careful when trying to update your Debian Postgres version to the latest 8 series. Dump your tables. Dump your tables.

Postgres version 7.4 and up have the ability to query the information_schema tables to find out which extra features are supported (a la Oracle/Sybase/MSSql). This works easy as in 1…2…3:

>select * from information_schema.sql_features;

Other tables in information_schema that can be queried are: sql_implementation_info, sql_languages, sql_packages, sql_sizing and sql_sizing_profiles (and so on). If you know what these tables are for, have fun. If you run an older version of postgres, do the queries on pg_namespace and pg_type to get the table names. The following query basically fetches all tables for a specific database:


SELECT oid, nspname, pg_get_userbyid(nspowner) AS namespaceowner, nspacl
FROM pg_namespace
ORDER BY nspname

This fetches all the namespaces: naturally, we have to look for the ‘public’ namespace. Use its OID to find the real set of tables within the (currently) selected database.


SELECT oid, relname, pg_get_userbyid(relowner) as tableowner, relacl, relhasoids
FROM pg_class
WHERE ((relkind = 'r') OR (relkind = 's'))
AND relnamespace = 2200::oid /* my public has an oid of 2200: yours may or may not */
ORDER BY relname

Naturally you will need to find triggers and typecasts and functions of course. Later more on those.

Datestamps, please.

Here is a tip for those aspiring web developers who think that they know it all after they have managed to create a ‘database-driven’ web application:

Always add a field to important tables that shows when specific items (or rows) were inserted or created. Call it date_created or something. At least it gives you a reference when something was created. Helps if you need to audit and verify data too.

To my surprise I saw that there’s no date/timestamp for WordPress’s *_links table. It has a ‘updated’ timestamp field, that gets updated each time you change the link. Which gets pretty unusable.

Left Outer Joins

More database stuff: the most exciting stuff of databases is actually analyzing the data afterwards (after the regular integrity checks, naturally). It’s a goldmine, except for that you need to know how to actually capture that data. Here’s a tip for the math lovers: if you’re not sure how your query should be built, think unions. Remember those junior years when you were wasting time drawing circles of collections of apples and pears? They’re back!

Continue reading

Databeast and time

Earlier this week, I thought I had made a crucial mistake when I was testing a couple of new and existing reports on my Debian database server (at home). For some kind of reason, firing up a query that returns a selection between two dates didn’t return the results I wanted to see (example):


SELECT * from atable
WHERE date_in >= '01/01/05'
AND date_in < = '01/31/05';

Under all normal circumstances this query should return all the rows that fall in between those days, including the 31st of January. My home server didn’t.

It ended up to be a timezone configuration error on my server: My server was set to EST, while AST would have been more appropriate.