[OpenIndiana-discuss] What happens when a ZIL drive dies?

Tue Jun 5 17:53:53 UTC 2012

On Jun 5, 2012, at 10:32 AM, Nick Hall wrote:
> On Mon, Jun 4, 2012 at 10:48 AM, Jan Owoc <jsowoc at gmail.com> wrote:
> 
>> 
>> The data on the main pool is always consistent in that a certain
>> operation either made it to the disk or it didn't. However, if your
>> application depends on the fact that writes make it out to disk in a
>> specific order (that's why it's sync'ing, right?), then it's the ZIL
>> that would contain a log/journal of what should have been written to
>> the disk and in what order. If you lose this, your file system remains
>> consistent, but some writes may have made it out to the disk before
>> others.
>> 
>> 
>> 
> Thanks everyone for all the responses. They were very helpful. My main
> application cases are ESXi, which as stated below does a lot of syncs, and
> MySQL. I had previously used the zilstat tool, but this is the first time
> I've heard of nfssrvtop, and I really appreciate that, as it works really
> well for analyzing these usage patterns. After doing more analysis, it
> seems as though most of my writes are actually async, so it probably
> wouldn't speed things up too dramatically to add an SLOG, so for now I'm
> going to stick with what I have, so thank you for that advice.
> 
> I'm just wondering, for my own personal knowledge and for anyone else who
> finds this thread later, for some clarification on the above quote. So, if
> I'm understanding this correctly, are you saying that, say I have an
> application and it writes to file A, then it writes to file B, then it
> writes to file C, then finally calls fsync, that there could be a case
> where if the computer crashed and at the same time the SLOG got fried
> (after files A B and C were written to, but before the sync was finished),
> then upon restart, the write to file B may have taken affect on the pool
> but the write to file A wouldn't be on there? Or am I misunderstanding?

This is no different than any other file system.

To clarify the events here:
	+ if the slog fails to write, then it will be marked offline and no longer used
	+ if the slog fails to read the data, then it might not be detected until the next
	   import
	+ if the slog is healthy when the system crashes, then it is expected to be 
	   healthy when the pool is imported, if not, then the import will fail
		+ depending on the OS release, you might be able to manually
		   import the pool by ignoring the slog and accepting the risk of data 
		   loss -- see the zpool import -m option
	+ if your application is sensitive to consistency, then it needs to manage
	    it own consistency (many do, many do not)
	+ not buffering disk I/O sux
	+ disks that do not honor the SYNCHRONIZE_CACHE command suck

> Usually when I think of journals I would think it would roll back the
> change to file B because it doesn't have a record in the journal to
> indicate that the sync was successful. I understand the possibility of
> loosing the last few seconds of writes in this scenario -- I'm just trying
> to wrap my head around the possibility of losing *part* of the last few
> seconds of data, and the much worse implications this has. Thanks,

Some journaling file systems only log metadata changes, so they can
avoid the pain of fsck. This is not the same as the ZIL.
 -- richard

--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422