[OpenIndiana-discuss] Split-root installations

Jim Klimov jimklimov at cos.ru
Wed Nov 27 23:12:39 UTC 2013


Hello James,

There are so many well-phrased sentences that I just can't snip out
the few I'd respond to ;) And thanks for the historical insights and
rationales, that is much appreciated too :)

> You can't buy disk drives small enough to make sense out of a
 > split /usr.

Yes you can =)

Regarding "small disks" - yes, you can get a, say, 20GB Intel SLC
drive as a nice reliable media for your rpool. Why waste 3Gb of it
on a default installation (and close to that on each upgrade) when
with gzip-9 it takes just 1Gb?

Even with larger SSDs, I'd rather spend the 2GB saved by compression
off of each BE on something more useful than the bloated compressable
files - be it a fast scratch area for coding, compilations, uploading,
or just bigger L2ARC partition. So far the SSDs do cost a considerable
amount of money ($/Gb), especially the good ones, and I'd like the
expensive investment to work off every penny overpaid for it and not
slack around storing files 3 times larger than they need be ;)
Suffice it to say that the financial crisis is not over, and only a
couple of months ago I got (remote) access to my first SSD drives ever.
So yes, I care very much about using them efficiently, and yes they
are not very big.

Also, less bytes on disk means less IO's and bandwidth to access them,
and when disk compressions remain in place for ARC and L2ARC cached
blocks (I am not sure if this was integrated in illumos yet), this
also gives substantial savings in other expensive fast storage types.
(Okay, rpool is not used with L2ARC, I know - but the idea for benefit
of compression in general stands...)

Maybe I chimed in early enough, that the system did not decay very far
from the state where it supported separate /usr filesystems, and there
are only a few exceptional cases, and there is still support for it in
the "trunk" codebase.

> having a supported split /usr is a huge tax on every single
> project that integrates

There is really not much code, to my knowledge, that *should* work
before all the local filesystems have been mounted. This is a class
of pretty low-level routines which must initialize the operating
environment. If we could put the network core-filesystem support
aside, networking could certainly start after the local filesystem
mounting.

This brings up a question: are NFS/CacheFS based remote mounts of the
root filesystem components (like the root, or /usr) *actually* used
by anyone nowadays?

Indeed, NFS-mounted pieces of the OS are not my interest at the moment
(though there was recently some interest from other community members
about things like these); and in case of a locally stored /usr as a
sub-dataset of the local root, things just work (although not quite
out of the default box - as demonstrated by /sbin/sh being a symlink,
and now this adventure with networking).

 > So, we started mostly ignoring it.  I believe the only support left in
 > there was a separate /usr mounted *locally* through normal disk
 > drivers so that / and /usr could have different mount options, but
 > even that wasn't really tested or thought too carefully about, so I
 > would not be shocked at all to find that an arbitrary collection of
 > system services are damaged if you attempt it.

Over many years of such setups across Solaris 10+ OSes with ZFS roots,
I did not encounter any problems that weren't solved (as described in
the Wiki summary of my experience). At the moment only the newest one
found - network/physical - is an unsolved problem, and I've only had
a day or two to think about it :)

 > I'm not trying to rain on someone's parade here, but I do suggest
 > considering the cost/benefit ratio when looking at restoring the bad
 > old days.  Reliability is in many ways driven by simplicity.

In fact, my patches to the filesystem/* SMF method scripts build on the
code which was there, including support for mounting child datasets of
the chosen bootfs - and including /usr among these as a large dedicated
chunk of script code. I just made this procedure more reliable to work
around some field-discovered problems, namely the non-empty mountpoints
(enforce "mount -O") and botched mountpoint attributes like "/a/usr".
Things were indeed "fragile" before such fixes, but I haven't had any
failures or mishaps in this area after integrating them on our systems.

 >> My worry here is that you're adding complexity to an already
>> complex system; instead, we need significant simplification.
>
> Another big +1 on that.

I am sorry to hear this... I guess these would be two major nails in
the coffin for an attempt to RTI into the illumos-gate and distros?
Well then, the procedure is now documented and patches arepublished.
If anyone wants to follow my ways, the maps are out there anyhow...

Alternately, much (not all) of this hassle could become irrelevant
if gzip-9 support just came for rootfs support :)

Unlike the part of my split-root project which supports filesystems
shared between BEs (like /var/cores or /var/adm) that might benefit
from being stored on other pools and/or from quotas/reservations,
the split /usr support is indeed mostly about great compression.

//Jim




More information about the OpenIndiana-discuss mailing list