[OpenIndiana-discuss] General ZFS questions (Michelle Knight)

Sat Jan 22 15:45:19 UTC 2011

Hi Calum,

That looks complicated, but I'm up for trying it.

At the moment, however, I've got to wait until I get another controller. That 
way, I can take a back up of my data set.

Once I've got a backup I can trust, I'll be more confident about trying this 
out. One reason Bernd noticed so many scrubs was that I was trying to check 
the set after each crash took the system out.

I won't bore you with what's happening to public sector wages here these last 
few years, you've probably got something comparable in your own part of the 
world. I'm going to try and buy a card with next month's wages.

Michelle.

On Saturday 22 January 2011 14:53:09 Calum Mackay wrote:
> On 13/01/11 14:03, Michelle Knight wrote:
> > At the moment, I'm having more problems.
> > 
> > In beginning the copy to the backup section, even with one device on the
> > motherboard e-sata and the other on a USB port (a combination which
> > worked under Open SOlaris 134) the OpenIndiana is freezing after copying
> > about 200gig of data.
> 
> hi Michelle, I know you resolved your issue by moving the disks, but if
> you're still seeing it, you might try this, before the system hangs,
> e.g. after it's been copying for a while:
> 
> 	echo "::memstat" | sudo mdb -k
> 
> If you see Kernel memory rising steadily, but the ZFS cache falling,
> over time, with the Kernel memory eventually gobbling up all of your
> memory (and then hanging), then a kernel memory leak would explain the
> hang.
> 
> If so, turn on kmem debugging in /etc/system:
> 
> 	set kmem_flags = 0xf
> 
> and reboot.
> 
> Then try the test again, monitoring memstat; when it gets to around 90%
> of memory, force a crash dump:
> 
> 	sudo reboot -d
> 
> [don't wait until it's hung; on some systems you may find it impossible
> to force a crash dump then]
> 
> [if you don't have savecore enabled with dumpadm, you will need to run
> savecore manually after reboot]
> 
> Uncompress the dump (assuming compression is on in dumpadm)
> 
> 	sudo savecore -vf /var/crash/host/vmdump.n
> 
> 
> Then run mdb's findleaks on the dump:
> 
> 	echo "::findleaks -v" | mdb unix.n vmcore.n > findleaks.out
> 
> It may take quite a while.
> 
> Finally, assuming that findleaks finds lots of leaks, try a few
> representative lines and note the BUFCTL address, then use the bufctl
> dcmd to get the stack trace that allocated that memory.
> 
> e.g, from findleaks.out:
> 
> CACHE             LEAKED           BUFCTL CALLER
> 	...
> ffffff02ce827688   40670 ffffff03065a48d8 bp_mapin_common+0x2cd
> 	...
> 
> then:
> 
> 	mdb unix.n vmcore.n
> 
> 	> ffffff03065a48d8::bufctl -v
> 
> Armed with the stack trace(s), we should be able to find the bug that's
> leaking memory...
> 
> 
> cheers,
> calum.
> 
> Calum Mackay
> Oracle.