[OpenIndiana-discuss] Swap during install

Tue Sep 25 11:58:19 UTC 2012

2012-09-24 22:31, Reginald Beardsley wrote:
> However, if ZFS is really smart about paging space in a pool, the current practice of putting swap in rpool may be the best choice. Growfs should easily  adjust the size.  Given the number of lies being told about disk geometry at various stages, it's hard to feel confident one knows what's actually going on.  It becomes particularly opaque when one considers the differences among SCSI, SAS & SATA disks.
> Some questions I have in case someone knows the answers:

Those are some interesting questions; I can only guess the replies
based on my general education about ZFS internals (and would love
to be proven wrong in this case): ZFS provides its zvols as block
devices to upper layers, including the swap subsystem, and does
not do many exceptions if any. It might, if there were special
dataset types (like filesystem and zvol layers today) with their
special allocation policies (basically, no COW and no snapshots),
in particular to use for swap.

So, generally, I *think* zfs zvols for swap behave just like any
other ZFS dataset usage mode, and ZFS is not "smart" about swap.
And performance-wise it might still be better to slice out some
fixed area on your disks to use as a swap location (perhaps even
mirrored with SVM/Disksuite - though without zfs checksums).

> Does ZFS allocate contiguous space for swap?

IMHO no, due to COW algorithm.

While it might be contiguous on the first run (even that is not
very likely due to possible interleave of userdata for swap and
metadata to address that userdata), but things should get mixed
up when you rewrite some swap data.

I'd argue that swap space is even not pre-allocated; instead,
it is just reserved and only allocated upon incoming writes.

> If swap in grown, does ZFS keep the space contiguous?

No, due to considerations above. Also, if there is some other
data in the pool, it might not be possible to allocate a large
enough contiguous range (though a special dataset type might
try to preallocate as few contiguous ranges as possible).

Also note that there is no ZFS defragmentation as of yet, and
even discussions about it did not yield a single good policy.
You could order blocks in TXG birth order which could speed up
zfs-sends and maybe scrubs, *OR* you could make current "live"
version of the dataset objects contiguous.

>
> How is swap handled if rpool is a mirrored pair?

Like other zfs access - likely striped for mirrored reads,
perhaps based on estimated head travel time on a particular
component disk and synced writes to all components of the
mirror being a bit slower than single-disk writes.

It is likely that copies of the block would reside at the
same logical offsets on different components of the mirror
and yield similar performance (unless reallocated from bad
sectors by HDD firmware) if you use the same geometry for
the pool slices - which you are not strictly required to do.

> Would striping swap improve performance?

If by this you mean using several swap devices (zvols or
otherwise) - I think they do fill up in parallel and thus
should increase performance if located on different media.

I did campaign for also adding priority levels for swap
devices (like in Linux) and perhaps tiering, so that one
could use small fast swap on SSDs or even DDR SSDs, and
maybe even push out long-unused blocks to HDDs (libraries
that are only formally needed to satisfy dependencies,
reservations of swap that are needed for some software
like VirtualBox but are never used, etc.) While it is
relatively cheap to add infinite amounts of swap on HDD
and not care if that is never used, it is more expensive
to dedicate SSD/DDR swap areas - you do want these to be
used more intensively if the need ever arises.

HTH,
//Jim Klimov