Had a problem with my main Solaris server in Taipei today which runs Solaris 10 6/06. It hung and when I rebooted it came up with this error:
WARNING – The following files in / differ from the boot archive:
cannot find: /etc/devices/mdi_ib_cache: No such file or directory
/kernel/drv/md.conf
The recommended action is to reboot and select “Solaris failsafe”
option from the boot menu. Then follow prompts to update the
boot archive.
I rebooted into failsafe mode and blindly followed the directions given:
/dev/dsk/c0d0s0 is under md control, skipping.
To manually recover the boot archive on a root mirror,mount the first
side (the one that the system boots from) and run:
bootadm update-archive -R <mount_point>
In summary, I did the following:
# mount /dev/dsk/c0d0s0 /mnt
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot
WRONG! WRONG! WRONG!
This ended up updating the boot archive on only one side of the mirror. The other side of the mirror was not modified. Hence, the mirrors became out of sync. However, the configuration is still set to boot from a mirrored device. This quickly gets you to this point on reboot:
NOTICE: /: unexpected free inode 9864, run fsck(1M)
The / file system (/dev/md/rdsk/d1) is being checked.
WARNING – Unable to repair the / filesystem. Run fsck
manually (fsck -F ufs /dev/md/rdsk/d1).
Yes, you really only need to update the boot archive on the first device in the mirror as that’s the one the system boots from. However, once it’s bootstrapped the kernel, it’s going to mount the full mirror, and the full mirror now has different contents on each side. Depending on what’s read from which side of the mirror, you’re likely to end up with some inconsistency detected.
The correct way to do things would be:
# mount /dev/dsk/c0d0s0 /mnt
# vi /mnt/etc/vfstab
(Change root mount device to the unmirrored device /dev/dsk/c0d0s0.)
# bootadm update-archive -R /mnt
Creating ram disk on /mnt
updating /mnt/platform/i86pc/boot_archive…this may take a minute
# reboot
You would then need to rebuild the mirrored root after you get the system back up.
However, that is not the end of the story. It turns out that all this boot archive rebuilding won’t fix this particular problem. This error message is normally generated when the files in the bootstrap image don’t match those in the actual filesystem. This is your heads up that the state during bootstrap doesn’t match what’s on the actual root filesystem. Going through the exercise of rebuilding the boot archive is supposed to get things back to a point where the bootstrap image and the filesystem match.
However, in this case the file /etc/devices/mdi_ib_cache is missing on the actual root filesystem. So the error message is actually wrong. If you rebuild the boot archive it’ll fail to add the file, because it doesn’t exist. And the next time you boot it’ll give you the same error again. The error is that a file is missing on the actual root filesystem, not that the boot archive doesn’t match the root filesystem.
And it turns out this file is completely unimportant. If it’s screwed up or missing, the system will replace it automatically and merrily go on its way. In other words, it’s absolutely no big deal if it’s missing on reboot.
The original error message I saw also had this advice:
To continue booting at your own risk, clear the service:
# svcadm clear system/boot-archive
This ‘at your own risk’ option actually turns out in this case to be the correct remedy for this problem.
So to summarize:
- The original error message misstates the problem as files being different instead of one being missing
- The recommended fix does not solve the problem
- The instructions for the recommended fix gives specific advice for mirrored filesystems that will damage your filesystem and waste lots of time undoing the damage
- The missing file is actually completely unimportant
- The ‘at your own risk’ option is the correct way to solve the problem
It looks like everything but point 3 is covered by BugID 6256649, however the public description is not useful. I didn’t find a bug report covering the problem with the instructions for rebuilding the boot archive on mirrored filesystems being wrong.
I also don’t know why the thing hung in the first place. Nothing in /var/adm/messages.