I’m Gonna SCREAM+

(This post isn’t about the new Tommy Heavenly6 single, it’s about Migrating from LiveJournal to WordPress.)

I’ve decided to migrate my blog from being hosted on LiveJournal to hosting it on my own server using the open-source WordPress software. I primarily blame Suresh for asking me to set up a blog for APCAUCE, which lead me to decide to try it on for myself first. It probably would have been better to have set it up on the APCAUCE web site first, but anyways…

One of the things I liked about WordPress was that it had a feature to import blog entries from LiveJournal, which would make transitioning a fairly smooth process. It was only after I got into it that I realized how limited the import process was. Turns out that migration isn’t so easy, unless you have very low standards.

The first problem is with LiveJournal’s export tool. It has two major limitations: 1) You can only export one month at a time and 2) comments are not saved.

There are various LiveJournal clients out that that are not limited like that, and I eventually settled on using jbackup.pl, a tool maintained by LiveJournal. This is a nifty tool that will backup your LiveJournal complete with comments to a GDBM database on a Unix server.

But there’s problems with it as well. The first problem is that it doesn’t output in the same XML format as the WordPress import tool expects. For this you need to use a hacked version of jbackup.pl which I found at this wiki entry on migrating from LiveJournal to WordPress. This version has been modified to output XML in a format that the import program can understand.

That turns out not to be the end of the problems. The next problem is that the import program can’t handle comments, because the export tool it works with doesn’t support them. So back to that wiki entry for a modified livejournal import tool.

There’s a couple of plugins that the wiki also recommends, but I encountered some bugs with the threaded comments plugin they wanted. It turns out you can edit the modified import tool to turn off threading, so I did that. Basically follow their procedure except skip the install of the plugins and edit the import tool to turn off threaded comments.

The next problem I encountered is that a bunch of entries were scrambled because entity encoded characters weren’t copied over correctly. I had to use a filter to translate those back to straight ASCII:

sed -e "s/&lt;/</g" -e 's/&quot;/"/g' -e "s/&gt;/>/g" -e "s/&apos;/\'/g"

Then it turned out that lj-user tags weren’t getting translated, so I had to filter those:

sed -e 's/<lj user="*\([^">]*\)"*>*/<a href="http:\/\/www.livejournal.com\/users\/\1\/"><img width="17" height="17" alt="" src="http:\/\/www.livejournal.com\/stc\/fck\/editor\/plugins\/livejournal\/userinfo.gif" style="vertical-align: bottom;" \/>\1<\/a>/g'

Actually that only applies to the plain-text editor tags. If you use the rich text editor on LiveJournal, you’ll have to translate those separately. I only had one, so I did it by hand.

The final problem is with bare URLs. In LiveJournal if you type or paste a bare URL such as: http://jameslick.com/ it’ll automatically turn it into a link. However, the exporter leaves it just as a raw URL. I probably should have built a filter to convert these as well, but it was so close at this point that I just did these by hand as well.

So after a few hours of work, 361 LiveJournal entries imported into WordPress with comments and various other corrections.

Wish List:

WordPress import tool that can handle comments, can handle the default jbackup.pl format, can handle entity encoded characters, and bare URLs. It should have been MUCH easier than this.

Leave a Reply

Your email address will not be published. Required fields are marked *