I’m Gonna SCREAM+

(This post isn’t about the new Tommy Heavenly6 single, it’s about Migrating from LiveJournal to WordPress.)

I’ve decided to migrate my blog from being hosted on LiveJournal to hosting it on my own server using the open-source WordPress software. I primarily blame Suresh for asking me to set up a blog for APCAUCE, which lead me to decide to try it on for myself first. It probably would have been better to have set it up on the APCAUCE web site first, but anyways…

One of the things I liked about WordPress was that it had a feature to import blog entries from LiveJournal, which would make transitioning a fairly smooth process. It was only after I got into it that I realized how limited the import process was. Turns out that migration isn’t so easy, unless you have very low standards.

The first problem is with LiveJournal’s export tool. It has two major limitations: 1) You can only export one month at a time and 2) comments are not saved.

There are various LiveJournal clients out that that are not limited like that, and I eventually settled on using jbackup.pl, a tool maintained by LiveJournal. This is a nifty tool that will backup your LiveJournal complete with comments to a GDBM database on a Unix server.

But there’s problems with it as well. The first problem is that it doesn’t output in the same XML format as the WordPress import tool expects. For this you need to use a hacked version of jbackup.pl which I found at this wiki entry on migrating from LiveJournal to WordPress. This version has been modified to output XML in a format that the import program can understand.

That turns out not to be the end of the problems. The next problem is that the import program can’t handle comments, because the export tool it works with doesn’t support them. So back to that wiki entry for a modified livejournal import tool.

There’s a couple of plugins that the wiki also recommends, but I encountered some bugs with the threaded comments plugin they wanted. It turns out you can edit the modified import tool to turn off threading, so I did that. Basically follow their procedure except skip the install of the plugins and edit the import tool to turn off threaded comments.

The next problem I encountered is that a bunch of entries were scrambled because entity encoded characters weren’t copied over correctly. I had to use a filter to translate those back to straight ASCII:

sed -e "s/&lt;/</g" -e 's/&quot;/"/g' -e "s/&gt;/>/g" -e "s/&apos;/\'/g"

Then it turned out that lj-user tags weren’t getting translated, so I had to filter those:

sed -e 's/<lj user="*\([^">]*\)"*>*/<a href="http:\/\/www.livejournal.com\/users\/\1\/"><img width="17" height="17" alt="" src="http:\/\/www.livejournal.com\/stc\/fck\/editor\/plugins\/livejournal\/userinfo.gif" style="vertical-align: bottom;" \/>\1<\/a>/g'

Actually that only applies to the plain-text editor tags. If you use the rich text editor on LiveJournal, you’ll have to translate those separately. I only had one, so I did it by hand.

The final problem is with bare URLs. In LiveJournal if you type or paste a bare URL such as: http://jameslick.com/ it’ll automatically turn it into a link. However, the exporter leaves it just as a raw URL. I probably should have built a filter to convert these as well, but it was so close at this point that I just did these by hand as well.

So after a few hours of work, 361 LiveJournal entries imported into WordPress with comments and various other corrections.

Wish List:

WordPress import tool that can handle comments, can handle the default jbackup.pl format, can handle entity encoded characters, and bare URLs. It should have been MUCH easier than this.

Business Bug Bites Again

A friend of mine has gotten bit with the business bug too and today has launched the Asian Parent online store, selling Chinese language childrens books and videos. If you are an overseas Chinese parent or just want to introduce your children to the Chinese language, this is a good site to check out.

Birthday

Sunday was my birthday, so Maggie took me to Wulai to stay one night at the Pause Landis Resort. We went to Wulai village first to have some wild boar and meatball soup at the usual places I like there. Then a short walk down to the next village where Pause Landis is to check in. In the afternoon we went (separately) to the group hot springs. Our room included dinner and breakfast. The dinner was good but a bit sparse, so we went over to 7-11 to get some snacks after. It was a nice quiet and relaxing weekend.

Understanding the numbers thing better now

In a previous entry I had wondered why “301600 is written 叄拾零萬壹仟陸佰 when the form has preprinted units, but is written 叄拾萬零壹仟陸佰 when you have to write the whole thing yourself?” The issue is where you put in the character for zero, 零. In Chinese it the character for zero is inserted explicitly to make it more clear that there is a break in the units. It’s useful in spoken Chinese because it’s easy to lose track of the units, but this is less of a problem when writing Chinese.

However, the explicit zero is only used once per break, and the units for any zero are also omitted, so that something like 3005 would only be 叄仟零五, though in regular speech you could say 三〇〇五, leaving out the units (and using the normal characters since it’s not a formal dollar amount on a check/form). What wasn’t clear to me is where to put the zero character when it occurs above 10,000 (萬). I had assumed that it’d be written the same way as it is on a preprinted form. (Interesting note is that akibare tells me that putting in an explicit zero character is not usually done in Japanese.)

I think I understand it a bit better now after seeing a new type of preprinted form than I had before. There are two forms I’d seen previously where the units were preprinted on the form. The first type would have the units with spaces in between them to write the numbers, e.g. “ 仟 佰 拾 萬 仟 佰 拾 元”. In that one you’d fill in the blanks with the numbers so that 301600 would look like “ 仟 佰叄拾零萬壹仟陸佰 拾 元”. You must put in a zero character for each place in between other characters, but before and after you can just run a line through it as a shortcut. Normally a “-” would be confused with the character for one, but since we have to use the special numeric characters, it is not ambiguous in this case.

The other type of form I’ve seen has a grid of boxes with “仟佰拾萬仟佰拾元” and you would fill in the numbers in the box below the unit, e.g.:

仟
 
ä½°
 
拾
叄
萬
零
仟
壹
ä½°
陸
拾
 
å…ƒ
 

But yesterday I saw a new form that suddenly made it make sense where to put the zero in writing out the number without a form. This form looked like this:

仟 佰 拾 萬 仟 佰 拾 元
萬   萬   萬            

(The units in this case are read vertically.) So in our 301600 example, it would look like:

  仟   佰 叄 拾 零 萬 陸 仟 陸 佰   拾   元
萬   萬   萬            

This format makes it all make sense on where to put the zeros when writing the number without a form, because the 3 goes in the 100,000s unit (拾萬), and not the 10s unit (拾) above the 10,000s unit (萬). So in the 301600 example you are writing (three 100,000s) (zero) (one 1000s) (six l00s).

Well at least it makes more sense to me now.

(It was a pain in the ass to get those vertical units to line up right.)

Touching

There’s a very emotional description of snooze’s last moments posted on his blog.

………………………………………………………………………

One of the cliches about dying is that one will live on in the memories of others. One of the odd things about snooze’s passing is that this cliche has even more meaning. His blog is still online, and updates are still being posted about him, if not by him. On DruidMUCK his virtual character snooze is even still connected, and his ‘bot’ features are still working away at converting URLs to shortcuts.

He also left behind several of his DJ mixes, some posted on his web site, others being collected together again by his friends. I played some of his mixes in the store on Tuesday in his honor. He was also the one who was constantly recommending interesting new things. I know that I learned about new music, new anime and new video blogs (like Tiki Bar TV and ze frank) from him.

Yeah, I’m still here

Got a few complaints about a lack of updates, so here goes.

Things have been slow this week mainly due to a steady rain most of the week and also Wednesday was a holiday here (Dragon Boat Festival). I’m told other Subways are also slow this week so I’m not too worried. It has also been a good opportunity for everyone to get up to speed without so much pressure. And it’s not so bad that we’re losing money.

Meanwhile things are going smooth enough that I’m pulling back from working myself and letting the staff take care of most things other than the weekly paperwork and banking.