[OpenIndiana-discuss] What happens when a ZIL drive dies?

Mark Creamer whitetr6 at gmail.com
Mon Jun 4 15:57:12 UTC 2012


You might supplement the advice you get here with this post from
Constantin Gonzales on his blog. I found it very helpful when I was
setting up my Solaris storage server.
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained

-Mark

On Mon, Jun 4, 2012 at 11:24 AM, Nick Hall <darknovanick at gmail.com> wrote:
> I'm considering buying a separate SSD drive for my ZIL as I do quite a bit
> over NFS and would like the latency to improve. But first I'm trying to
> understand exactly how the ZIL works and what happens in case of a problem.
> I'll list my understanding here, and I'm hoping someone can correct me if
> I'm understanding this incorrectly:
>
> - In normal operation, the ZIL drive would just be written to but never
> read from.
>
> - In the case of a power failure, the ZIL will probably contain 5-10
> seconds (maybe up to 30 seconds) worth of writes that didn't make it onto
> the main hard drives. The next time the system boots, ZFS will use what's
> in the ZIL to bring the main hard drives up to date.
>
> - I'm running ZFS version 28 -- in this version, if the ZIL drive were to
> die while the system were running, the system would switch back to using
> the main pool hard drives to store the ZIL, just as it currently does since
> I have no separate ZIL drive right now.
>
>
> So, now I have a couple of questions:
>
> - If the ZIL drive were to die while the system were running, I'm assuming
> no data would be lost? In order for this to work, the system would need to
> cache everything in the ZIL in RAM, so if the ZIL were to die, it would
> write the transactions that were on the ZIL from RAM to the main pool
> drives. Applications would not notice anything from their perspective. Is
> this what happens?
>
> - So far, assuming I'm understanding this correctly, none of the above
> scenarios involve any data loss. The scenario I can think of that would
> involve data loss is if there's a power failure and the ZIL drive at the
> same time. It seems likely that this scenario would be caused by
> a catastrophic hardware failure, and the main system drives would also die,
> but let's pretend that only the ZIL drive is affected. So any transactions
> stored in the ZIL are lost. I'm thinking that the system would boot up,
> note that the ZIL drive is dead and switch the ZIL back to the main pool
> drives, and the last 5-30 seconds or writes would be lost forever. But
> would the system be in a consistent state, that is, things would be the
> same as if you went back in time 30 seconds before the system died and just
> pulled the plug? So there's no corruption, just the loss of those seconds
> of data?
>
> - Are there any other scenarios I'm not thinking, specifically any other
> scenarios that would cause corruption or loss of data? My use is for a home
> server -- I would like higher NFS write performance, but not by making it
> more likely I have corrupted or majorly lost data, but for my use, if I
> only lost the last few seconds or writes and things were in a consistent
> state, it would be of little consequence. I understand that for a
> commercial server that would be huge issue, though, as banking transactions
> lost or something would be a major problem. Thanks.
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



-- 
Mark



More information about the OpenIndiana-discuss mailing list