[OpenIndiana-discuss] What happens when a ZIL drive dies?

Jan Owoc jsowoc at gmail.com
Mon Jun 4 15:48:39 UTC 2012


On Mon, Jun 4, 2012 at 9:24 AM, Nick Hall <darknovanick at gmail.com> wrote:
> I'm considering buying a separate SSD drive for my ZIL as I do quite a bit
> over NFS and would like the latency to improve. But first I'm trying to
> understand exactly how the ZIL works and what happens in case of a problem.
> I'll list my understanding here, and I'm hoping someone can correct me if
> I'm understanding this incorrectly:

The ZIL fixes latency with synchronous writes. Do you have a workload
that you can benchmark with the ZIL disabled to determine if it's
indeed the ZIL that's slowing you down?
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

(just remember to re-enable the ZIL after you are done benchmarking,
if you care about data integrity)


> - If the ZIL drive were to die while the system were running, I'm assuming
> no data would be lost? In order for this to work, the system would need to
> cache everything in the ZIL in RAM, so if the ZIL were to die, it would
> write the transactions that were on the ZIL from RAM to the main pool
> drives. Applications would not notice anything from their perspective. Is
> this what happens?

The ZIL is sort of like a journal. Your application issues a "sync"
and ZFS isn't supposed to return from the sync until the data actually
makes it onto disk. With platters rotating etc., this can take tens of
miliseconds. A ZIL on NVRAM (or an SSD) would allow this sync'ed data
to hit the fast-write device, and the system call to return
immediately. The data will also, as you'd read, make it to the disk in
5-30 seconds. Yes, a copy stays in the RAM, and it's this copy that is
normally written (and not a copy re-read from the ZIL).



> - So far, assuming I'm understanding this correctly, none of the above
> scenarios involve any data loss. The scenario I can think of that would
> involve data loss is if there's a power failure and the ZIL drive at the
> same time. It seems likely that this scenario would be caused by
> a catastrophic hardware failure, and the main system drives would also die,
> but let's pretend that only the ZIL drive is affected. So any transactions
> stored in the ZIL are lost. I'm thinking that the system would boot up,
> note that the ZIL drive is dead and switch the ZIL back to the main pool
> drives, and the last 5-30 seconds or writes would be lost forever. But
> would the system be in a consistent state, that is, things would be the
> same as if you went back in time 30 seconds before the system died and just
> pulled the plug? So there's no corruption, just the loss of those seconds
> of data?

I don't have first-hand experience with this case, so maybe someone
can correct me if I'm wrong.

The data on the main pool is always consistent in that a certain
operation either made it to the disk or it didn't. However, if your
application depends on the fact that writes make it out to disk in a
specific order (that's why it's sync'ing, right?), then it's the ZIL
that would contain a log/journal of what should have been written to
the disk and in what order. If you lose this, your file system remains
consistent, but some writes may have made it out to the disk before
others.


> My use is for a home
> server -- I would like higher NFS write performance, but not by making it
> more likely I have corrupted or majorly lost data, but for my use, if I
> only lost the last few seconds or writes and things were in a consistent
> state, it would be of little consequence.

You need to first find out if your writes are synchronous or not,
otherwise you are wasting your time (and money) getting a separate log
device. It's mostly databases that require that file operations happen
in a specific order - for a home file server, you might not see any
benefit to a separate log device. Next, make sure you get an SSD with
fast sequential writes - many SSDs focus on random read speed (that's
what a desktop user wants to see).


Jan



More information about the OpenIndiana-discuss mailing list