Still problems

The busted-ass D-Link switch replacement is still having problems:

On the main solaris box dropped link 4 times since last night

On the secondary solaris box dropped link 4 times since last night

On the windows PC I don’t have link stats but it dropped 31 packets to the router in that amount of time. 31 may not seem like much, but it’s about 31 higher than it should be.

Anyone have any ideas where to look for a non-D-Link gigabit switch in Taipei?

D-Link DGS-1008D Repair

For those who missed the previous episode in this saga here’s a summary: 1) Bought D-Link DGS-1008D 2) It dies 3) Return to shop for replacement 4) It dies.

I sent the second one in for repairs and got back a replacement (different serial number) for it yesterday. It’s still alive so far.

I also got the Gigabit card in my windows box working reliably. Previously it would cause noise on the sound card and give erratic performance. After poking around I found that the Realtek gigabit card and my M-Audio Revolution 7.1 sound card were both insisting on latching on to IRQ 21. After moving PCI cards around and disabling some unused devices in the BIOS, the Realtek decided to try out IRQ 16, and now no longer causes my sound card to make rude noises.

Performance between my XP box and main Solaris server are pretty ripping now. However, my older Solaris server barely breaks 200mbps, just about double a plain old fast ethernet board can do. It’s an old Pentium 2 450mhz system, so I guess that’s about all one can expect from such ancient technology. I guess I’ll have to get a new motherboard/cpu to build the RAID-Z file server now. (The main Solaris box is a small-form-factor PC which only has room for 2 hard drives, so it can at best do RAID-1.)

Resolution Changer

We’ve probably all come across annoying programs that insist in taking over your monitor to run in ‘full screen’ mode, but are designed only for a particular monitor resolution, leaving nice big black borders when running a higher resolution.

I’ve been using just such a program a lot lately that is designed to only run at 800×600 resolution. Even on my laptop this ends up with huge amounts of wasted space, plus all the text and images are a bit too small to see comfortably. I’ve been manually switching my display to 800×600, but that gets old fast. I figured there must be a better way to do this.

It turns out that there is, and it’s a tiny program called Resolution Changer. This little gem weighs in at only 84k (when’s the last time you installed anything smaller than a megabyte?) and can switch to arbitrary resolutions, run a program, and then when the program finishes switch back to the original resolution.

I downloaded this program, and copied it into “C:\Program Files\reschange\”. Then I went to the shortcut which starts the annoying program and edited the properties so that this:

“C:\Program Files\reschange\reschange.exe” -width=800 -height=600

was added to the start of the “Target:” in front of the program executable that was previously there.

Voila! Now when I click to start that program, my display changes to 800×600. When I exit the program it switches right back to the original resolution.

Sometimes it’s best NOT to follow directions…

Had a problem with my main Solaris server in Taipei today which runs Solaris 10 6/06. It hung and when I rebooted it came up with this error:

WARNING – The following files in / differ from the boot archive:
cannot find: /etc/devices/mdi_ib_cache: No such file or directory
/kernel/drv/md.conf
The recommended action is to reboot and select “Solaris failsafe”
option from the boot menu. Then follow prompts to update the
boot archive.

I rebooted into failsafe mode and blindly followed the directions given:

/dev/dsk/c0d0s0 is under md control, skipping.
To manually recover the boot archive on a root mirror,mount the first
side (the one that the system boots from) and run:

bootadm update-archive -R <mount_point>

In summary, I did the following:

# mount /dev/dsk/c0d0s0 /mnt
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot

WRONG! WRONG! WRONG!

This ended up updating the boot archive on only one side of the mirror. The other side of the mirror was not modified. Hence, the mirrors became out of sync. However, the configuration is still set to boot from a mirrored device. This quickly gets you to this point on reboot:

NOTICE: /: unexpected free inode 9864, run fsck(1M)
The / file system (/dev/md/rdsk/d1) is being checked.

WARNING – Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/md/rdsk/d1).

Yes, you really only need to update the boot archive on the first device in the mirror as that’s the one the system boots from. However, once it’s bootstrapped the kernel, it’s going to mount the full mirror, and the full mirror now has different contents on each side. Depending on what’s read from which side of the mirror, you’re likely to end up with some inconsistency detected.

The correct way to do things would be:

# mount /dev/dsk/c0d0s0 /mnt
# vi /mnt/etc/vfstab
(Change root mount device to the unmirrored device /dev/dsk/c0d0s0.)
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot

You would then need to rebuild the mirrored root after you get the system back up.

However, that is not the end of the story. It turns out that all this boot archive rebuilding won’t fix this particular problem. This error message is normally generated when the files in the bootstrap image don’t match those in the actual filesystem. This is your heads up that the state during bootstrap doesn’t match what’s on the actual root filesystem. Going through the exercise of rebuilding the boot archive is supposed to get things back to a point where the bootstrap image and the filesystem match.

However, in this case the file /etc/devices/mdi_ib_cache is missing on the actual root filesystem. So the error message is actually wrong. If you rebuild the boot archive it’ll fail to add the file, because it doesn’t exist. And the next time you boot it’ll give you the same error again. The error is that a file is missing on the actual root filesystem, not that the boot archive doesn’t match the root filesystem.

And it turns out this file is completely unimportant. If it’s screwed up or missing, the system will replace it automatically and merrily go on its way. In other words, it’s absolutely no big deal if it’s missing on reboot.

The original error message I saw also had this advice:

To continue booting at your own risk, clear the service:
# svcadm clear system/boot-archive

This ‘at your own risk’ option actually turns out in this case to be the correct remedy for this problem.

So to summarize:

  1. The original error message misstates the problem as files being different instead of one being missing
  2. The recommended fix does not solve the problem
  3. The instructions for the recommended fix gives specific advice for mirrored filesystems that will damage your filesystem and waste lots of time undoing the damage
  4. The missing file is actually completely unimportant
  5. The ‘at your own risk’ option is the correct way to solve the problem

It looks like everything but point 3 is covered by BugID 6256649, however the public description is not useful. I didn’t find a bug report covering the problem with the instructions for rebuilding the boot archive on mirrored filesystems being wrong.

I also don’t know why the thing hung in the first place. Nothing in /var/adm/messages.

An Anniversary

Excerpts from my e-mail five years ago.

Date: Tue, 11 Sep 2001 12:20:03 -0700
From: Wilbert Lick
To: James Lick

Hi Jim,

You’ve probably heard the news by now about the plane crashes, the world trade center, etc. Well, I was in Vermont and Boston this weekend and Monday. I had reservations on United 175 from Boston to Los Angeles at 8:00 on Tuesday morning. On Monday afternoon, I changed reservations and flew out Monday evening. If I had kept my reservations, I would now be underneath the world trade center. I’m in somewhat of a shock right now.

Anyway, hope to see you soon.

Dad

Date: Wed, 12 Sep 2001 23:20:25 GMT
From: United Airlines
To: James Lick

UA 0844 TPE-SFO on Sep 13 is cancelled

Date: Thu, 13 Sep 2001 12:34:42 +0800 (CST)
From: James Lick
To: Rick Lilly

Currently I’ve been rebooked on the Monday flight, which means I will miss your special celebration. If flights resume, I may be able to get out on standby, but given that flights have been shut down so long, getting on is very doubtful. If I am unable to make it, please accept my regrets and congratulations.

Date: Thu, 13 Sep 2001 21:56:17 GMT
From: United Airlines
To: James Lick

UA 0844 TPE-SFO on Sep 14 is cancelled

Date: Sat, 15 Sep 2001 08:56:59 +0800 (CST)
From: James Lick
To: Rick Lilly

I’m trying to fly standby today. Will know if a couple of hours.

Date: Sun, 16 Sep 2001 03:18:02 +0800 (CST)
From: James Lick
To: Wilbert Lick

I made it!

Date: Sun, 16 Sep 2001 04:11:24 +0800 (CST)
From: James Lick
To: Zorch Offtopic List

Within minutes of the first attack, all the news channels in Taiwan had live coverage of the wtc tower one on fire. I was just getting home and had glanced over at the TV when the second plane crashed into tower two.

Everyone was in shock. We couldn’t believe it was a plane that crashed until they showed it again. The next two hours we watched in horror as one disaster after another unfolded.

I tried calling home but all the international lines were busy.

I found out later that my dad had a reservation on United 175, the second plane to crash. Fortunately his meeting had ended early and he had flown out the night before instead, and he was able to get an email out to me saying he was fine, but in shock.

Even though he was fine, this was more than I could take. I sat sobbing for about half an hour just from the intense emotion of it all. I lost my mother in July after a battle with Pick’s Disease and to come so close to losing both parents in such a short time was an unbearable thought. This entire year has been extremely stressful for me for a variety of other reasons as well.

I had a ticket back to SFO on Thursday. I finally got out Saturday morning and just got home in Santa Clara a couple of hours ago. They had 3 pages of stand-by passengers by the time I got there. Even so, it looks like most people stayed away and all the standby passengers made it on. I even got an upgrade.

Security was extremely tight in Taipei. Checked in bags were hand searched and x-rayed during check-in. At the normal security checkpoint, x-rays of carry-ons were more thorough, and there was a brand new hi-tech x-ray machine being set up. At the gate, carry-on bags were searched, and each person swept with a wand-style metal detector. At boarding time, passports and tickets were checked to ensure matches.

Coming in at SFO, it didn’t look like a whole lot of flights were arriving yet, with very few people in the terminals. At check-in there were mobs of people in line but it looked like most weren’t going anywhere. At the rental center, there was hardly anyone around. Hertz has a big board
listing the names and car locations for Hertz Gold members which probably has space for ~300 names and is usually 2/3 full. Today there were only 10 names on the board. On the good side, the upgraded me from economy to full size, presumably due to the lack of business.

At least I made it back for my friend’s wedding on Sunday. Some of their guests from Phoenix aren’t able to get a flight though.

It’s been a rough week.

Date: Wed, 26 Sep 2001 15:57:07 -0700
From: Wilbert Lick
To: James Lick

Hi Jim,

Since I haven’t heard anything, I assume your trip back to Taipei was ok. In this morning’s paper, I saw that Taiwan got hit by another typhoon. Was there much damage? How is Maggie’s shop, etc.?

Dad

Date: Thu, 27 Sep 2001 10:28:57 +0800 (CST)
From: James Lick
To: Wilbert Lick

Typhoon Lekima struck southern Taiwan. In the north we have had some heavy rain, but fortunately not much in the way of flooding. The plane ride in was a bit bumpy, but not too bad.

Lekima is currently making it’s way in a northwest direction across the southern part of Taiwan and is expected to start out across the Taiwan Strait tonight. The rim of the storm is still expected to give us some heavy rain throughout the island through Friday night or Saturday morning.

The ground floor of Maggie’s shop is all cleaned out now from Typhoon Nari. The water was about 1.5′ deep on the ground level. The basement is still drying out. It has some wood laminate flooring which we are wondering will survive or not. On the other side of the building’s
basement, an entire wall collapsed along a 40 foot stretch, and one of the doors was blown off its hinges by the force of the flooding. Fortunately the wall was not a load bearing one. Except for the flooring and some trim, most of the basement on the side of Maggie’s shop is masonry and needs at most a good drying out and a repainting.

Maggie’s home is slightly downslope, and had water about 5 to 6 feet deep. Fortunately it didn’t rise enough to inundate the second floor.

First WordPress plugin

For some reason wordpress hasn’t fixed a bug in how non-ASCII emails are sent even two years after the bug was submitted. As a result, the subject lines of emails from blogs using non-ASCII characters are sent using unencoded Subject headers. This not only violates email standards, but also tends to cause such mails to be tagged as spam, not to mention that the resulting subject line is unreadable.

Since the wp_mail() routine is one that is plugin-replaceable, I whipped up a plugin based on kpumuk’s solution posted in the bug report. It was surprisingly simple to make the plugin and now the notification emails I get aren’t all scrambled.

Plugin: WP RFC2047

Upgrades

Finally got around to upgrading to WordPress 2.0.4 which has been out for a while. I was kinda reluctant to do it because I know from upgrading other PHP software once you’ve made lots of modifications it gets tricky to upgrade. Fortunately WordPress is a bit better designed in this area and I had managed to keep most of my mods limited to the Ocadia themes file. One place where I’d hacked the main files was in the get_archives function which I modded to show months in Chinese and English. Fortunately I had backed everything up and it was a simple matter to pull out those mods and stick them in the Ocadia theme directory instead.

I’ve also enabled the ‘Blogroll’ feature, so if you look over in the sidebar you can see some of the blogs I read.