[OpenIndiana-discuss] What happens when a ZIL drive dies?
Dan Swartzendruber
dswartz at druber.com
Mon Jun 4 17:06:28 UTC 2012
On 6/4/2012 11:56 AM, Richard Elling wrote:
> On Jun 4, 2012, at 8:24 AM, Nick Hall wrote:
> For NFS workloads, the ZIL implements the synchronous semantics between
> the NFS server and client. The best way to get better performance is to have the
> client run in async mode when possible (Solaris clients do this automatically, and
> have for a very long time, Linux... not so much).
>
> The risk is that the server unexpectedly reboots and the synchronous writes from
> the client are lost. In that case, the client thinks data is written, but it is not. The
> server is happy either way... it is the client that is sad.
>
The most annoying client in this respect is ESXi, which insists on doing
sync operations. I understand the logic there - unlike, say, an
application which can decide to do async operations, ESXi is using NFS
as the backing store for virtual disks, so when a client (windows,
linux, whatever) does disk writes to virtualized SCSI controller (frex),
the guest may be doing writes on behalf of a journalized filesystem
which is doing writes in a specific order, possibly even with write
barriers. In that case, cheating and forcing the writes to be
asynchronous (say by 'sync=disabled') can in fact cause guest filesystem
corruption. I can't afford a high quality SSD to reduce latency, so I
made an informed decision to disable sync mode. The mitigation for me
is that I do zfs snapshots every night of the ESXi datastore, so the
worst case is losing a day's work. Given this is for a home/soho setup,
and given that the openindiana SAN is on a hefty UPS, I'm willing to
take the chance.
More information about the OpenIndiana-discuss
mailing list