[OpenIndiana-discuss] OpenSolaris on SPARC

Jim Klimov jimklimov at cos.ru
Sun Jul 29 14:51:46 UTC 2012


Hello all,

   There are some discussions about whether illumos on SPARC is
a dead-end or not (i.e. whether it is stupid to buy HW systems
from the one vendor and not buy their software support, or if
there is more than one vendor, or if anyone would pick up the
open-sourced processor designs for the Niagaras and build some
cool appliances or servers). So, just for the anecdotal sake,
I wanted to share this weekend's experience about OpenSolaris
on SPARC - and how it saved the day.

   While it may seem unlikely at this moment that new SPARC
systems would be rolled out for OI to get installed on them,
there are many already-deployed reliable boxes which would
run obsolete (or our new) software "until they fscking die".

   I was asked to look at a T2000 with Sol 10u8 which did just
that: it died during what could have been fsck - if ZFS had
one. Apparently, the system's users did nothing formally
invalid, they were just zfs-sending and zfs-receiving some
datasets within the pool in order to recompress older data
with gzip-9, then they tried to destroy the older dataset
tree and rename the compressed copy to take its place.
Something went wrong, the pool locked up with no IOs taking
place (according to iostat). The "zfs" commands all hung,
however "zpool status" and friends did not. Filesystem
operations also went well, so running zones were properly
stopped and the box was ultimately rebooted. It did not
come back up.

   Luckily, there was a Solaris installation server in
that network, so it took a few minutes to prepare a LAN
installation resource from a stashed SXCE snv_129_sparc
image, and boot the T2000 from the network, into single
user mode. OpenSolaris found nothing suspicious about
the data pool and the rpool, imported and exported them
without complaints. While at the rpool, we deleted the
/etc/zfs/zpool.cache file to allow the system to boot
its Solaris 10. It booted, but also hung at subsequent
"zpool import -R / pool" request - in the same way:
no iostat operations to report, and no errors in the
logs...

   Back to the networked boot of OpenSolaris, where we
imported the data pool, destroyed the remaining old
uncompressed datasets and completed the renaming of
compressed datasets to take place of those ones,
transparently to the zones and other consumers.
This did unclog something, so the Solaris 10 image
did afterwards quickly import the pool and happily
uses it today.

   Yesterday the old OpenSolaris SXCE for SPARC did
save the day. I can easily imagine hitting some bugs
in ZFS that were fixed after the last SXCE release,
where a hypothetical "OpenIndiana for SPARC" image
would be able to save us - even if it is not (yet)
used as the everyday OS for the box.

Hope this story entertains someone and helps others,
//Jim Klimov





More information about the OpenIndiana-discuss mailing list