[OpenIndiana-discuss] What happens when a ZIL drive dies?

Mike La Spina mike.laspina at laspina.ca
Mon Jun 4 18:23:11 UTC 2012


Everything you asked seems to be fully covered by our community.

Just wanted to add the following;

Not all SSDs are made for slog usage. Be aware that low end (and even
some high endones) SSDs may not successfully commit a write operation to
the flash write cell address boundary due to a power loss event. At a
minimum the use of a UPS is advisable or SSDs that guaranty complete
write operations in the event of a power loss.  

-----Original Message-----
From: Nick Hall [mailto:darknovanick at gmail.com] 
Sent: Monday, June 04, 2012 10:25 AM
To: OpenIndiana-discuss at openindiana.org
Subject: [OpenIndiana-discuss] What happens when a ZIL drive dies?

I'm considering buying a separate SSD drive for my ZIL as I do quite a
bit over NFS and would like the latency to improve. But first I'm trying
to understand exactly how the ZIL works and what happens in case of a
problem.
I'll list my understanding here, and I'm hoping someone can correct me
if I'm understanding this incorrectly:

- In normal operation, the ZIL drive would just be written to but never
read from.

- In the case of a power failure, the ZIL will probably contain 5-10
seconds (maybe up to 30 seconds) worth of writes that didn't make it
onto the main hard drives. The next time the system boots, ZFS will use
what's in the ZIL to bring the main hard drives up to date.

- I'm running ZFS version 28 -- in this version, if the ZIL drive were
to die while the system were running, the system would switch back to
using the main pool hard drives to store the ZIL, just as it currently
does since I have no separate ZIL drive right now.


So, now I have a couple of questions:

- If the ZIL drive were to die while the system were running, I'm
assuming no data would be lost? In order for this to work, the system
would need to cache everything in the ZIL in RAM, so if the ZIL were to
die, it would write the transactions that were on the ZIL from RAM to
the main pool drives. Applications would not notice anything from their
perspective. Is this what happens?

- So far, assuming I'm understanding this correctly, none of the above
scenarios involve any data loss. The scenario I can think of that would
involve data loss is if there's a power failure and the ZIL drive at the
same time. It seems likely that this scenario would be caused by a
catastrophic hardware failure, and the main system drives would also
die, but let's pretend that only the ZIL drive is affected. So any
transactions stored in the ZIL are lost. I'm thinking that the system
would boot up, note that the ZIL drive is dead and switch the ZIL back
to the main pool drives, and the last 5-30 seconds or writes would be
lost forever. But would the system be in a consistent state, that is,
things would be the same as if you went back in time 30 seconds before
the system died and just pulled the plug? So there's no corruption, just
the loss of those seconds of data?

- Are there any other scenarios I'm not thinking, specifically any other
scenarios that would cause corruption or loss of data? My use is for a
home server -- I would like higher NFS write performance, but not by
making it more likely I have corrupted or majorly lost data, but for my
use, if I only lost the last few seconds or writes and things were in a
consistent state, it would be of little consequence. I understand that
for a commercial server that would be huge issue, though, as banking
transactions lost or something would be a major problem. Thanks.
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list