[OpenIndiana-discuss] ZFS stalls with oi_151?

Tommy Eriksen te at rackhosting.com
Wed Nov 23 08:59:03 UTC 2011


Hi George,

Sure, I'll try that.
I take it that nobody has any real idea what's actually happening?
I ask because we were actually trying to move customers to these boxes after having them run without problems for about a month, but this has kinda put a stop to it due to orders from above :/

Thanks,
Tommy




Den 21/11/2011 kl. 03.28 skrev George Wilson:

> Tommy,
> 
> If you get this again, can you generate a crash dump? The easiest way would be to do the following from the console:
> 
> # mdb -K
>> $<systemdump
> 
> I'm afraid if you try to do a 'reboot -d' that it will hang too.
> 
> - George
> 
> On Nov 20, 2011, at 1:14 PM, Tommy Eriksen wrote:
> 
>> Hi,
>> 
>> Just saw this exact behavior again in oi_148.
>> I downgraded the boxes to 148 after seeing the "all i/o stalls" behavior in 151 and talking about it on-list about a month ago, but today, one of them suddenly did it again.
>> 
>> This time, i've gotten a threads.txt (as per the mail below), for download at http://fotoarkiv.com/threads.txt
>> Also, from another suggestion:
>> root at zfsnas2:~# echo "::walk spa | ::print spa_t spa_name spa_suspended" | mdb -k
>> spa_name = [ "datastore0" ]
>> spa_suspended = 0
>> spa_name = [ "rpool" ]
>> spa_suspended = 0
>> 
>> I've just given the box a hard reset and hope it'll keep going for a while before we get this one again :/
>> 
>> Best regards,
>> Tommy
>> 
>> 
>> 
>> Den 21/10/2011 kl. 15.31 skrev Steve Gonczi:
>> 
>> Are you running with dedup enabled?
>> 
>> If the box is still responsive, try to generate a thread stack listing e.g:
>> echo "::threadlist -v" > mdb -k > /tmp/threads.txt
>> 
>> Steve
>> 
>> On Oct 21, 2011, at 4:16, Tommy Eriksen <te at rackhosting.com<mailto:te at rackhosting.com>> wrote:
>> 
>> Hi guys,
>> 
>> I've got a bit of a ZFS problem:
>> All of a sudden, and it doesn't seem related to load or anything, the system will stop writing to the disks in my storage pool. No error messages are logged (that I can find anyway), nothing in dmesg, messages or the likes.
>> 
>> ZFS stalls, a simple snapshot command (or the likes) just hangs indefinitely and can't be stopped with ctrl+c or kill -9.
>> 
>> Today, the stall happened after I had been running 2 VMs on each (running on vsphere5 connecting via iscsi) running iozone -s 200G (just to generate a bunch of load). Happily, this morning, I saw that they were still running without problem and stopped them. Then, when asking vSphere to delete the VMs, all write I/O stalled. A bit too much irony for me :)
>> 
>> However, and this puzzled me, everything else seems to run perfectly, even up to zfs writing new data on the l2arc devices while data is read.
>> 
>> Boxes (2 of the same) are:
>> Supermicro based, 24 bay chassis
>> 2*X5645 Xeon
>> 48gigs of RAM
>> 3*LSI2008 controllers coupled to
>> 20 Seagate Constellation ES 3TB SATA
>> 2 Intel 600GB SSD
>> 2 Intel 311 20GB SSD
>> 
>> 18 of the 3TB drives are set up in mirrored vdevs, the last 2 are spares.
>> 
>> Running oi_151a (trying a downgrade to 148 today, I think, since I have 5 or so boxes running without problems on 148, but both my 151a are playing up).
>> 
>> /etc/system variables:
>> set zfs:zfs_vdev_max_pending = 4
>> set zfs:l2arc_noprefetch = 0
>> set zfs:zfs_vdev_cache_size = 0
>> 
>> 
>> I can write to a (spare) disk on the same controller without errors, so I take it its not a general I/O stall on the controller:
>> root at zfsnas3:/var/adm# dd if=/dev/zero of=/dev/rdsk/c8t5000C50035DE14FAd0s0 bs=1M
>> ^C1640+0 records in
>> 1640+0 records out
>> 1719664640 bytes (1.7 GB) copied, 11.131 s, 154 MB/s
>> 
>> iostat reported - note no writes to any of the other drives. All writes just stall.
>> 
>>                        extended device statistics       ---- errors ---
>> r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
>> 3631.6  167.2 14505.5 152337.1  0.0  2.2    0.0    0.6   0 157   0   0   0   0 c8
>> 109.0    0.8  472.9    0.0  0.0  0.0    0.0    0.5   0   3   0   0   0   0 c8t5000C50035B922CCd0
>> 143.0    0.8  567.1    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035CA8A5Cd0
>> 89.6    0.8  414.1    0.0  0.0  0.1    0.0    0.6   0   2   0   0   0   0 c8t5000C50035CAB258d0
>> 95.8    0.8  443.3    0.0  0.0  0.0    0.0    0.5   0   2   0   0   0   0 c8t5000C50035DE3DEBd0
>> 144.8    0.8  626.4    0.0  0.0  0.1    0.0    0.6   0   4   0   0   0   0 c8t5000C50035BE1945d0
>> 134.0    0.8  505.7    0.0  0.0  0.0    0.0    0.4   0   3   0   0   0   0 c8t5000C50035DDB02Ed0
>> 1.0    0.4    3.4    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5000C50035DE0414d0
>> 107.8    0.8  461.6    0.0  0.0  0.0    0.0    0.3   0   2   0   0   0   0 c8t5000C50035D40D15d0
>> 117.2    0.8  516.5    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035DE0C86d0
>> 64.2    0.8  261.2    0.0  0.0  0.0    0.0    0.6   0   2   0   0   0   0 c8t5000C50035DD6044d0
>> 2.0    0.8    6.8    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5001517959582943d0
>> 2.0    0.8    6.8    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5001517959582691d0
>> 109.8    0.8  423.5    0.0  0.0  0.0    0.0    0.3   0   2   0   0   0   0 c8t5000C50035C13A6Bd0
>> 765.0    0.8 3070.9    0.0  0.0  0.2    0.0    0.2   0   7   0   0   0   0 c8t5001517959699FE0d0
>> 1.0  149.2    3.4 152337.1  0.0  1.0    0.0    6.5   0  97   0   0   0   0 c8t5000C50035DE14FAd0
>> 210.4    0.8  775.4    0.0  0.0  0.1    0.0    0.4   0   3   0   0   0   0 c8t5000C50035CA1E58d0
>> 689.4    0.8 2776.6    0.0  0.0  0.1    0.0    0.2   0   7   0   0   0   0 c8t50015179596A8717d0
>> 108.6    0.8  430.5    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035CBD12Ad0
>> 165.6    0.8  561.5    0.0  0.0  0.1    0.0    0.4   0   3   0   0   0   0 c8t5000C50035CA90DDd0
>> 164.4    0.8  578.5    0.0  0.0  0.1    0.0    0.4   0   4   0   0   0   0 c8t5000C50035DDFC34d0
>> 125.6    0.8  477.7    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035DE2AD3d0
>> 93.2    0.8  371.3    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035B94C40d0
>> 113.2    0.8  445.3    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035BA02AEd0
>> 75.4    0.8  304.8    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035DDA579d0
>> 
>> 
>> …Is anyone else seeing similar?
>> 
>> Thanks a lot,
>> Tommy
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org<mailto:OpenIndiana-discuss at openindiana.org>
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> 
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org<mailto:OpenIndiana-discuss at openindiana.org>
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>> 
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
> 
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss




More information about the OpenIndiana-discuss mailing list