[OpenIndiana-discuss] Split-root installations

Jim Klimov jimklimov at cos.ru
Mon Dec 2 01:06:40 UTC 2013


I've pursued the idea of a separate SMF service which would take
care of local ZFS-based split-root systems instead of hacking into
existing scripts (network or filesystem). All the logic that I've
earlier added into fs-root and fs-minimal has now moved into the
new script and service fs-root-zfs. This service declares as its
dependants the networking services as well as filesystem/root, to
ensure that it runs before consumers of /usr and other filesystems
that make up a root hierarchy. In fact, it mounts all of the ZFS
filesystems which are children of the current bootfs or of the
$rpool/SHARED dataset, and /var/run in particular, fulfilling
much of filesystem/minimal in one early blow. This should be
appreciated by NWAM in particular :)

It does check for filesystems listed in /etc/vfstab and provided
by non-ZFS technologies - this script should skip mounting those
mountpoints and any under them. Also, it only runs for a ZFS root.

The large changes I proposed before to the older scripts are no
longer needed for the local-ZFS split-root setups; though smaller
changes (provided in new patches) are still added - to skip
zfs-mounting in case that the filesystem in question has already
been mounted.

My local tests were quite successful, so I posted the update at
the Wiki:

http://wiki.openindiana.org/oi/Advanced+-+Split-root+installation

http://wiki.openindiana.org/download/attachments/27230229/fs-root-zfs
http://wiki.openindiana.org/download/attachments/27230229/fs-root-zfs.xml

Slight but important fixes to the older fs-root and fs-minimal
scripts (as well as the whole of the new fs-root-zfs script) can
be reviewed in this patch:

http://wiki.openindiana.org/download/attachments/27230229/fs-root-zfs.patch

An example of console output for booting a BE with bad mountpoint
due to untimely reset while this BE was mounted for administrative
tasks from another running BE (with console debugging enabled by
"touch /$BEMOUNT/.debug_mnt"):

OpenIndiana Build oi_151a8 64-bit (illumos 7256a34efe)
SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
All rights reserved. Use is subject to license terms.
Rootfs mountpoint not '/' but '/tmp/tmp.a5aGOb', trying to fix.
Fixing 'rpool/ROOT/test/opt' to use '/opt' mountpoint instead of 
'/tmp/tmp.a5aGOb/opt': shifted in same root hierarchy
Fixing 'rpool/ROOT/test/usr' to use '/usr' mountpoint instead of 
'/tmp/tmp.a5aGOb/usr': shifted in same root hierarchy
Fixing 'rpool/ROOT/test/usr/local' to use '/usr/local' mountpoint 
instead of '/tmp/tmp.a5aGOb/usr/local': shifted in same root hierarchy
Fixing 'rpool/ROOT/test/var' to use '/var' mountpoint instead of 
'/tmp/tmp.a5aGOb/var': shifted in same root hierarchy
Mounting '/usr': use 'rpool/ROOT/test/usr': in same root hierarchy
Mounting '/var': use 'rpool/ROOT/test/var': in same root hierarchy
Mounting '/var/adm': use 'rpool/SHARED/var/adm': the only option
Not ZFS-mounting '/tmp': equal or under a non-ZFS mountpoint '/tmp'
Mounting '/opt': use 'rpool/ROOT/test/opt': in same root hierarchy
Not mounting: '/usr' from 'rpool/ROOT/test/usr': something already mounted
Mounting '/usr/local': use 'rpool/ROOT/test/usr/local': in same root 
hierarchy
Not mounting: '/var' from 'rpool/ROOT/test/var': something already mounted
Not mounting: '/var' from 'rpool/SHARED/var': canmount!=on
Not mounting: '/var/adm' from 'rpool/SHARED/var/adm': something already 
mounted
Mounting '/var/cores': use 'rpool/SHARED/var/cores': in shared root 
hierarchy
Mounting '/var/crash': use 'rpool/SHARED/var/crash': in shared root 
hierarchy
Mounting '/var/log': use 'rpool/SHARED/var/log': in shared root hierarchy
Mounting '/var/mail': use 'rpool/SHARED/var/mail': in shared root hierarchy
Not mounting: '/var/spool' from 'rpool/SHARED/var/spool': canmount!=on
Mounting '/var/spool/clientmqueue': use 
'rpool/SHARED/var/spool/clientmqueue': in shared root hierarchy
Mounting '/var/spool/mqueue': use 'rpool/SHARED/var/spool/mqueue': in 
shared root hierarchy
Mounting '/var/tmp': use 'rpool/SHARED/var/tmp': in shared root hierarchy
fs-root-zfs: completed without fatal errors
Hostname: openindiana
...



The filesystem tree on this box is:

# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
rpool/ROOT/test        2065321    314150   1751171  16% /
swap                   1347972      1112   1346860   1% /etc/svc/volatile
rpool/ROOT/test/usr    2576515    825344   1751171  33% /usr
rpool/ROOT/test/var    1809675     58504   1751171   4% /var
rpool/SHARED/var/adm   1751283       112   1751171   1% /var/adm
swap                   1346912        52   1346860   1% /var/run
rpool/ROOT/test/opt    1752589      1418   1751171   1% /opt
rpool/ROOT/test/usr/local
                        1751202        31   1751171   1% /usr/local
rpool/SHARED/var/cores
                        1762117     10946   1751171   1% /var/cores
rpool/SHARED/var/crash
                        1751202        31   1751171   1% /var/crash
rpool/SHARED/var/log   1751229        58   1751171   1% /var/log
rpool/SHARED/var/mail
                        1751203        32   1751171   1% /var/mail
rpool/SHARED/var/spool/clientmqueue
                        1751203        32   1751171   1% 
/var/spool/clientmqueue
rpool/SHARED/var/spool/mqueue
                        1751202        31   1751171   1% /var/spool/mqueue
rpool/SHARED/var/tmp   1751223        52   1751171   1% /var/tmp
/usr/lib/libc/libc_hwcap2.so.1
                        2576515    825344   1751171  33% /lib/libc.so.1
swap                   1346868         8   1346860   1% /tmp
rpool/export           1751203        32   1751171   1% /export
rpool/export/home      1751203        32   1751171   1% /export/home
rpool/export/home/admin
                        1751846       675   1751171   1% /export/home/admin
rpool                  1751217        46   1751171   1% /rpool
/export/home/admin     1751846       675   1751171   1% /home/admin


Hope for comments,
//Jim Klimov

On 2013-11-30 10:25, Jim Klimov wrote:
> 2) I think one more valid approach to unroll these dependencies
> via SMF in a packageable manner has emerged, and a rather apparent
> one: to move (or duplicate, or invoke) the code from fs-root which
> mounts a zfs-based /usr filesystem into a service of its own, on
> which consumers of the /usr namespace would depend (optional_all).
>
> At start this service would check if current root is zfs, and
> if a child dataset or legacy-mounted ZFS /usr are known and
> available - it would mount the dataset if yes. Otherwise it
> would exit without an error. As a result, the networking
> scripts in my split-zfs-based-root cause would be guaranteed
> to have a /usr before they run.
>
> It would (should) have no impact on systems that use monoroots
> on ZFS, or that use other roots (networked, metadevice, etc.) -
> these would work or fail the same as they do today.
>
> 3) Similarly, such a service can mount ZFS-based datasets of
> the rest of the root hierarchy if available (/var, children
> of bootfs, SHARED/*) and as a result of this, even the NWAM
> method on systems with local storage would have a complete
> environment to work in (for its LDAP/NIS interaction), all
> without major rehaul of SMF dependencies and method code.
>
> But in this extended case there is a possible though improbable
> loophole: if some parts of the operating environment including
> the rootfs are mounted from ZFS, but some major components
> like /var work from nfs/cachefs/ufs/... and then some datasets
> like /var/adm would be mounted on top of that. A script that
> only mounts a ZFS hierarchy in order to avoid dependencies
> on networking and metadevices would apparently ignore these
> other options; at most it can detect them in /etc/vfstab and
> stop mounting stuff under the involved mountpoint (this would
> come in later via filesystem service chains that exist today).
>
> And the current filesystem service methods should need to check
> that they don't mount the same (zfs) filesystem twice, so as
> to not bail out on "zfs mount" errors due to this.



More information about the OpenIndiana-discuss mailing list