Nav links

Wednesday, 14 December 2005

SQLite delight

If you've been reading the MovableType forums recently then you might have come away with the impression that the venerable Berkeley DB database is old, slow, and easy to corrupt. A similar response can be found if you Google for berkeley corrupt movable type.

Despite the fact that the only trouble I have had in almost three years was when my host upgraded their version of the Berkeley DB software, so my blog stopped working until my database was upgraded too, I was concerned about these aspersions being cast. Fortunately, it doesn't take much effort to discover that the BDB is still highly regarded in technical circles, and is often the best tool for the job. However, there are some areas where alternative databases are better suited.

Perl.com has a great concise technical comparison of BDB with the new contender SQLite. From this programmer's standpoint, and ignoring speed issues, SQLite is a clearly superior solution. The data itself is all neatly contained in a single file, as opposed to BDB's sprawling mass. Even more importantly, the data itself is exceedingly easy to query and manipulate, using standard SQL syntax. It feels the same as a "real" database, such as SQL Server or MySQL, which makes the transition between them all quite straightforward.

For these reasons, when I had the opportunity to move my MovableType database from BDB to SQLite with my transition to a new host, I did so. It immediately paid dividends, because I found that the data needed some coaxing to get the blog set up perfectly. Although a script is provided that theoretically moves the data from BDB to any of the supported SQL databases, I found that it did not actually run for me. Rather than debug the script, I decided to use MT's export and import facilities. This route does not maintain your MT settings and templates, but I had so much detritus from previous versions of MT around that I thought a clean start would be good.

After the initial import of my old blog entries to my new blog, I was disappointed to find that most of the individual entry archive names had changed - they were longer than before. For example, /archives/2005/11/worth_waiting_f_1.html had become /archives/2005/11/worth_waiting_for.html, thus breaking my permalinks. It turns out that the entries do not have their file names stored in the export file. This combined with the fact that the default Basename Length had been increased from 15 to 30 in the latest MT to produce different file names. Luckily this was easy to fix. After deleting my entries, setting the default back to 15, then importing again, I found the archive names back to how they should be.

Longer base names do produce more readable URLs, so I decided to go with the new default of 30. But what if I wanted to export and import my data in future? How could I import some entries with a basename of 15 and some with a length of 30. Luckily, with an SQL-compliant database this is easy. Just import everything with a length of 30, shorten the relevant ones, check for duplicates, and rename as appropriate.

Say, for example everything before 2004 used 15 character basenames. This command will shorten them:
UPDATE mt_entry SET entry_basename = substr(entry_basename,0,15) WHERE entry_created_on < date('2004-01-01')

Then, because some entries might now have identical names, we must seek them out:
select count(*) as num, entry_basename from mt_entry group by entry_basename order by num desc

If you find any with the same name, just use a numeric suffix, as MT itself does.

This fixing is all pleasantly hassle-free, due to the benefits of SQL compliance.