Resolution Changer

We’ve probably all come across annoying programs that insist in taking over your monitor to run in ‘full screen’ mode, but are designed only for a particular monitor resolution, leaving nice big black borders when running a higher resolution.

I’ve been using just such a program a lot lately that is designed to only run at 800×600 resolution. Even on my laptop this ends up with huge amounts of wasted space, plus all the text and images are a bit too small to see comfortably. I’ve been manually switching my display to 800×600, but that gets old fast. I figured there must be a better way to do this.

It turns out that there is, and it’s a tiny program called Resolution Changer. This little gem weighs in at only 84k (when’s the last time you installed anything smaller than a megabyte?) and can switch to arbitrary resolutions, run a program, and then when the program finishes switch back to the original resolution.

I downloaded this program, and copied it into “C:\Program Files\reschange\”. Then I went to the shortcut which starts the annoying program and edited the properties so that this:

“C:\Program Files\reschange\reschange.exe” -width=800 -height=600

was added to the start of the “Target:” in front of the program executable that was previously there.

Voila! Now when I click to start that program, my display changes to 800×600. When I exit the program it switches right back to the original resolution.

Sometimes it’s best NOT to follow directions…

Had a problem with my main Solaris server in Taipei today which runs Solaris 10 6/06. It hung and when I rebooted it came up with this error:

WARNING – The following files in / differ from the boot archive:
cannot find: /etc/devices/mdi_ib_cache: No such file or directory
/kernel/drv/md.conf
The recommended action is to reboot and select “Solaris failsafe”
option from the boot menu. Then follow prompts to update the
boot archive.

I rebooted into failsafe mode and blindly followed the directions given:

/dev/dsk/c0d0s0 is under md control, skipping.
To manually recover the boot archive on a root mirror,mount the first
side (the one that the system boots from) and run:

bootadm update-archive -R <mount_point>

In summary, I did the following:

# mount /dev/dsk/c0d0s0 /mnt
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot

WRONG! WRONG! WRONG!

This ended up updating the boot archive on only one side of the mirror. The other side of the mirror was not modified. Hence, the mirrors became out of sync. However, the configuration is still set to boot from a mirrored device. This quickly gets you to this point on reboot:

NOTICE: /: unexpected free inode 9864, run fsck(1M)
The / file system (/dev/md/rdsk/d1) is being checked.

WARNING – Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/md/rdsk/d1).

Yes, you really only need to update the boot archive on the first device in the mirror as that’s the one the system boots from. However, once it’s bootstrapped the kernel, it’s going to mount the full mirror, and the full mirror now has different contents on each side. Depending on what’s read from which side of the mirror, you’re likely to end up with some inconsistency detected.

The correct way to do things would be:

# mount /dev/dsk/c0d0s0 /mnt
# vi /mnt/etc/vfstab
(Change root mount device to the unmirrored device /dev/dsk/c0d0s0.)
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot

You would then need to rebuild the mirrored root after you get the system back up.

However, that is not the end of the story. It turns out that all this boot archive rebuilding won’t fix this particular problem. This error message is normally generated when the files in the bootstrap image don’t match those in the actual filesystem. This is your heads up that the state during bootstrap doesn’t match what’s on the actual root filesystem. Going through the exercise of rebuilding the boot archive is supposed to get things back to a point where the bootstrap image and the filesystem match.

However, in this case the file /etc/devices/mdi_ib_cache is missing on the actual root filesystem. So the error message is actually wrong. If you rebuild the boot archive it’ll fail to add the file, because it doesn’t exist. And the next time you boot it’ll give you the same error again. The error is that a file is missing on the actual root filesystem, not that the boot archive doesn’t match the root filesystem.

And it turns out this file is completely unimportant. If it’s screwed up or missing, the system will replace it automatically and merrily go on its way. In other words, it’s absolutely no big deal if it’s missing on reboot.

The original error message I saw also had this advice:

To continue booting at your own risk, clear the service:
# svcadm clear system/boot-archive

This ‘at your own risk’ option actually turns out in this case to be the correct remedy for this problem.

So to summarize:

  1. The original error message misstates the problem as files being different instead of one being missing
  2. The recommended fix does not solve the problem
  3. The instructions for the recommended fix gives specific advice for mirrored filesystems that will damage your filesystem and waste lots of time undoing the damage
  4. The missing file is actually completely unimportant
  5. The ‘at your own risk’ option is the correct way to solve the problem

It looks like everything but point 3 is covered by BugID 6256649, however the public description is not useful. I didn’t find a bug report covering the problem with the instructions for rebuilding the boot archive on mirrored filesystems being wrong.

I also don’t know why the thing hung in the first place. Nothing in /var/adm/messages.

Updates

So, my Dad was supposed to come this week but he got a pacemaker installed last Friday so he’s postponing it a couple of weeks.

Monday afternoon I went down to Gaoxiong for APNIC22’s spam tutorial session. I arrived in the evening and hit the Liuhe Night Market for two bowls of Fried Fish Soup before heading over to the Grand Hi-Lai Hotel to check in. Tuesday was spent all day in the spam session where I gave an update on the spam status in Taiwan at the end. Also met several people from Taiwan working on the spam problem, so that’s nice. Then back to Liuhe Night Market for two more bowls of Fried Fish Soup and also ordered two bowls to take home with me. Got back home Tuesday night around 11:30.

Usually as a speaker the conference provides free registration. This time TWNIC and APNIC also paid for my hotel room, bus fare, breakfast, and lunch. I’d be a lot happier going around giving these spam talks if my expenses were covered like this all the time.

I’ve been neglecting to read my friends’ blogs lately. I’d been trying to catch up, but I was almost 300 posts in the hole, so I’ve just binned all the old articles and will start from scratch now. I’ve decided to use NewsGator for my RSS reader now. It’s an online service read through a web browser so I don’t need to worry about keeping my desktop and laptop in sync. I understand Bloglines does much the same thing and is more popular, but I’ve already gotten accustomed to NewsGator.

Avoid: DGS-1008D

The gigabit ethernet switch which I recently bought appeared to work fine for the first week. Then it started locking up randomly. At this point I looked through the logs on my servers and found that it had also sporadically been dropping link a few times a day for a couple of seconds at a time. That’s not that bad, but not that great either. Locking up is unforgivable.

When it locked up it would flash on all 8 bottom LEDs and then flash the port 8 top LED alternatively. Power cycling it didn’t work. Leaving it unplugged for several hours seemed to work. I had bought it at one of the mid-sized computer shops, so last Friday my wife and I went to exchange it. The shop grumbled a bit that it was past 7 days but swapped it out anyways.

The new one still drops link occasionally, and tonight it locked up again. Checking the all-knowing oracle (otherwise known as Google), one quickly finds numerous bad reviews for this switch at e.g. Newegg and Amazon mentioning the same things, link dropping and the switch locking up. The switch locking up was blamed on overheating. Wish I’d read these reviews beforehand, but I thought gigabit switches were pretty much all the same now.

So looks like D-Link blew it on this one. I’ll have to get Maggie to ring up the local D-Link support and see what they are going to do about it since it seems to be a common problem. I’ve generally had pretty good experience with D-Link in the past though. I have a travel router, wifi pc-card, 24-port 10/100 switch and an 8-port 10/100 switch from D-Link which haven’t had any notable problems. I’m hoping they will replace it with a different hardware version that doesn’t have this problem (both mine were version C3) or a different model. It looks like this model is being discontinued now, but its little brother, the DGS-1005D has reviews noting the same problem and it’s still out on the market.

I’ll also have to look around for a different brand of gigabit switch, but they aren’t very widely available here.

In other related updates from that entry, I have upgraded my primary server to use ZFS on the home directory partition using mirrored zfs. I decided zfs root was still too dodgy. When they get it in a supported version and upgradeable I’ll check it out. (Briefly they need official zfs mountroot support, grub zfs support, failsafe boot zfs support and support upgrading before I’ll touch it.) The SATA drive in the test server is also ZFS’d though it’s not RAIDed at all. The important part is that ZFS and the SATA controller both seem to be stable, just my damn network isn’t.

Modernizing

Last week I decided to start modernizing my computing environment here. I’ve been looking at the new Solaris ZFS and RAID-Z features and thinking of setting up a network fileserver using those technologies instead of the pile of drives in my Windows XP box as I currently have it.

However, since I do a lot of work with video, my current 100mbps network wouldn’t cut it. Fortunately gigabit ethernet hardware is quite affordable now. I picked up a D-Link DGS-1008D 8-Port Gigabit Ethernet swich, and two Corega CG-LAPCIGT2 gigabit ethernet cards. I had intended to put one card in my main Solaris box and the other in my Windows XP desktop, and leave my second Solaris box at 100mbps. The Corega card was chosen because it uses the Realtek chipset which is well supported on Solaris.

Installing the ethernet card in my main Solaris box was a breeze. It showed up as rge0 right away and I just had to move /etc/hostname.iprb0 to /etc/hostname.rge0 and it was all set. Installing on XP was a bit different. XP didn’t recognize the card, so I had to load the driver and reboot a couple of times to get it going. After getting it up, I tested a file copy and got about 6X speed increase.

Unfortunately the XP box ran into stability problems running the new ethernet card. It would occasionally lose connection to the switch, sometimes the sound card would make modem-like noises when the network was heavily used, and network performance would slow down dramatically until rebooted. I’m not sure what the problem is quite yet, but I took it out of the XP box and plopped it into my other Solaris server where it is humming along nicely.

Next up, if I want to set up a new fileserver, I’d like to use modern SATA drives. Up untill now I’ve used the older ATA drives on PCs. These days there’s no longer much of a price penalty for going with SATA, and performance is now significantly better than ATA. It’s probably not going to be long before ATA starts getting phased out.

Solaris has traditionally been quite picky about support of add in disk controllers. Currently though, Solaris offers good support for SATA boards using the Silicon Image chipsets. When I went out to look what’s available here in Taiwan I found most boards were based on Initio or Silicon Image chipsets, so availability wasn’t a problem. I picked up a cheap Upmost Uptech SR150-2 board which is a 2 port SATA board based on the Sil3112 chipset. I had trouble finding a SATA-II PCI board, as the Silicon Image SATA-II boards were all PCI-Express, but I’ve since spotted a vendor with a Silicon Image SATA-II PCI board. I also picked up a Seagate 320gb SATA-II drive.

Installing the SATA board in my second Solaris server, I rebooted and found that the board was not recognized by the OS. After some poking around I found that Solaris supports SATA boards only using the non-RAID version of the board BIOS while most boards are sold with the RAID BIOS installed. No problem, I’ll just download the non-RAID BIOS and flash utility from Silicon Image’s web site.

After downloading the files, making a boot floppy and then rebooting the server with it, I find out that the flash utility doesn’t support the flash chip on my board. My board uses a PMC Pm39LV010-70JCE flash chip, which wasn’t one of the chips recognized. Fortunately the flash utility will ask you which chip yours is compatible with, but of course I had no idea. After some googling, I found a page in Japanese of someone encoutering the same problem. Extrapolating from the English words in that page I figured he was saying the the PMC chip is the same as an SST 39VF010. I tried that and sure enough the BIOS flashed OK, and Solaris immediately recognized the controller.

Now I get to play around a bit with ZFS and see if it is feasible. I also need to find a gigabit card that’ll work reliably in my XP box. For now I’ll use my laptop for testing as it has a built-in gigabit interface. If things look good, I’ll get that SATA-II 4-port board I saw and some more drives and set up a nice fileserver.

USB Disks on Solaris 10 x86

USB support in Solaris is greatly improved these days. One thing I’ve had as a goal for a while now is to more fully migrate my server backups to using disk instead of tape. (Disk for backups? Isn’t that backwards? Actually, if you look at media prices, IDE disks are WAY cheaper than tape media, and that doesn’t even cover the exorbitant cost of tape drives. And rsync *rocks* for this type of application.) One thing that’s still keeping me on tape in addition to disk is that I don’t have any way to do easy offsite storage with disk. But USB drives provide a perfect opportunity to do this.

I’ve been playing a lot with Solaris 10 x86, so I decided to see how hard it is to get a USB disk installed. Unfortunately there’s not a whole lot of documentation out there, and what I could find was pretty vague, along the lines of ‘run rmformat’ which tells you approximately 5% of what you need to know. There’s a couple of pitfalls if you just poke around, so here’s the complete procedure I worked out.

Keep in mind that this is for Solaris 10 x86. The sparc version would be different, but would probably just skip the fdisk stuff and the s2 workarounds. This procedure also assumes you want to use ufs as the filesystem instead of something else like FAT (aka pcfs). Also remember that ufs drives cannot be shared between sparc and x86 servers because of the different methods for layout on the disk. This assumes that your USB chipset is supported and that your USB disk doesn’t require some proprietary driver. Most USB chipsets are supported in Solaris 10 x86, and most current USB disks on the market are generic. If the disk is plug-and-play on XP and Mac, it’ll probably work. The setup requires root access.

Disable Volume Management: /etc/init.d/volmgt stop

(If you don’t, vold will get in the way of what you are doing.)

Plug in your USB drive.

Look at end of file /var/adm/messages and run prtconf -v to verify it is recognized.

/var/adm/messages:

Apr 24 01:16:27 yemaozi.tcp.com usba: [ID 912658 kern.info] USB 2.0 device (usb4b4,6830) operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@6, scsa2usb0 at bus address 2
Apr 24 01:16:27 yemaozi.tcp.com usba: [ID 349649 kern.info] Cypress Semiconductor USB2.0 Storage Device DEF1097DC60E

prtconf -v:

            storage, instance #0
...
                Hardware properties:
...
                    name='usb-product-name' type=string items=1
                        value='USB2.0 Storage Device'
                    name='usb-vendor-name' type=string items=1
                        value='Cypress Semiconductor'
                    name='usb-serialno' type=string items=1
                        value='DEF1097DC60E'
...

Run rmformat -l to list removable drives:

Looking for devices...
     1. Logical Node: /dev/rdsk/c1t1d0p0
        Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@1,0
        Connected Device: ASUS     CD-S520/A4       1.21
        Device Type: CD Reader
     2. Logical Node: /dev/rdsk/c2t0d0p0
        Physical Node: /pci@0,0/pci1297,fb62@1d,7/storage@6/disk@0,0
        Connected Device: SAMSUNG  SP1604N
        Device Type: Removable

The first device is the CD-ROM drive. The USB disk is device 2, which is at /dev/rdsk/c2t0d0p0

First create fdisk partitions: run fdisk -B /dev/rdsk/c2t0d0p0 which assigns the
whole disk to solaris. You may need to use fdisk interactively if the disk already had PC style partitioning on it to remove all other partitions before proceeding.

Add a label: rmformat -b usb-sam /dev/rdsk/c2t0d0p0

This label can be up to 8 characters. You don’t have to add one, but if you don’t add a label, it will show up as “unnamed_rmdisk” under volume management which looks icky. Here we used the label ‘usb-sam’.

Now here’s where it gets a little tricky. If you’re used to working with sparc stuff, you know that you can use partition s2 to make one big partition using the whole disk. On Solaris x86, there’s always a one cylinder boot partition (s8) at the beginning of the disk. This applies even if you’ll never boot off of that disk. So if you want to use the whole disk, you will have to start any data partitions at cylinder 1 instead of 0. Another catch is that Volume Management by default looks for s2 on removable disks, so if you use a different partition such as s0, it won’t automatically mount. But format won’t let you change the size of s2 if you keep the partition id as ‘backup’. So putting all of this together, the easiest way to resolve this is to change the partition id of s2 to ‘root’, set the permissions to ‘wm’ and then resize it to start at cylinder 1 and use the rest of the disk.

Partition disk: run format -e. Without the -e flag, Solaris won’t show
removable disks. Select your USB disk and then enter the following commands to do as described in the previous paragraph:

partition
2
root
wm
1
press enter for default
label
press enter for default
y
quit
quit

Run prtvtoc /dev/rdsk/c2t0d0p0 to verify partitioning, e.g.:

* /dev/rdsk/c2t0d0p0 (volume "usb-sam") partition map
*
* Dimensions:
*     512 bytes/sector
*      63 sectors/track
*     255 tracks/cylinder
*   16065 sectors/cylinder
*   19456 cylinders
*   19454 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       2      2    00      16065 312512445 312528509
       8      1    01          0     16065     16064

Verify that partition s2 does NOT start at sector 0.

Now create a filesystem: newfs /dev/rdsk/c2t0d0s2

Re-enable Volume Management: /etc/init.d/volmgt start

It should show up after a second or two:

df -k:

Filesystem           1K-blocks      Used Available Use% Mounted on
...
/vol/dev/dsk/c2t0d0/usb-sam/s2
                     153893759     65553 152289269   1% /rmdisk/usb-sam/s2

When you want to remove the disk, don’t just unplug it from the system. You need to use eject to have Volume Management unmount it first: eject usb-sam

Here we use the disk’s label to identify the device to eject. If you didn’t use a label you can do this instead, though it will be ambiguous if you have multiple removable disks in use: eject rmdisk

Then before you actually unplug the drive, you will need to stop Volume Management because it cannot deal with devices being unplugged. This has to be the stupidest thing ever. Here’s what the manual says:

     A disk storage device can not be removed or  inserted  while
     vold  is  active.  To  remove  or  insert  a removeable mass
     storage device such as a USB memory stick,  first  stop  the
     daemon by issuing the command /etc/init.d/volmgt stop. After
     the device has been removed or inserted, restart the  daemon
     by issuing the command /etc/init.d/volmgt start.

So much for Volume Management being automatic. Optionally you can either remove rmdisk support in vold.conf or disable vold completely and mount/unmount the drive manually. Would probably be easier.

Santa Barbara

I was planning on finishing up a few things on Saturday morning and then head to Santa Barbara, but a few things turned into more things and then it was already early afternoon and I was still not ready, so had to push SB off until Sunday. That was compounded by Taiwan having a series of strong quakes, so had to console my wife over that.

One of the new toys that arrived while I was at school was my new iPod. When the iPod first came out, I thought it was nice, but a little too big. When the iPod mini came out, I thought it was just the right size, but 4gb for $249 was too much for too little capacity. So when Apple rolled out the new iPod mini 6gb for $249 and lowered the 4gb model to $199, it was finally just right. I ordered a blue iPod mini 6gb during the brief time I was in Taipei between Kyoto and US. All in all I’m quite impressed with it.

I also got an iTrip mini for it, a small FM transmitter that fits on top so you can play your iPod over a car stereo. Reviews of it have been pretty evenly split between “blows goats” and “totally awesome.” So I’m going to buck the trend and say that it is merely adequate. The main problem is that the instructions tell you to find a free channel, and better yet, one with free channels to either side of it. Problem is, if you live anywhere reasonably populated, it’s pretty damn difficult to find a free channel at all, much less one that has free channels to either side. In Silicon Valley, there’s only a couple of free channels at all. Plus on the ride down to SB, I had to retune twice as my free channel was suddenly in use a couple of hours down the road. That said, it does work reasonably well, though it is somewhat annoying to be fiddling with channels every so often. If you live somewhere more remote, it’ll probably work just great.

So anyways, Sunday morning I was pretty much ready, but wanted to make one hardware change on neko.tcp.com. I got that all done OK, but then after moving things back around on the computer rack, I knocked the master power switch on the remote power management box, and all the computers went down at once. Everything rebooted cleanly a few minutes later, so not too bad, but tcp.com had an uptime of 1 year and 1 month, so it was kinda disappointing to interrupt that uptime.

Finally I was able to make it onto the road and down to Santa Barbara where I’ll be until Wednesday morning.