[OpenIndiana-discuss] ZFS stalls with oi_151?

Tommy Eriksen te at rackhosting.com
Sun Nov 20 18:14:35 UTC 2011


Hi,

Just saw this exact behavior again in oi_148.
I downgraded the boxes to 148 after seeing the "all i/o stalls" behavior in 151 and talking about it on-list about a month ago, but today, one of them suddenly did it again.

This time, i've gotten a threads.txt (as per the mail below), for download at http://fotoarkiv.com/threads.txt
Also, from another suggestion:
root at zfsnas2:~# echo "::walk spa | ::print spa_t spa_name spa_suspended" | mdb -k
spa_name = [ "datastore0" ]
spa_suspended = 0
spa_name = [ "rpool" ]
spa_suspended = 0

I've just given the box a hard reset and hope it'll keep going for a while before we get this one again :/

Best regards,
Tommy



Den 21/10/2011 kl. 15.31 skrev Steve Gonczi:

Are you running with dedup enabled?

If the box is still responsive, try to generate a thread stack listing e.g:
echo "::threadlist -v" > mdb -k > /tmp/threads.txt

Steve

On Oct 21, 2011, at 4:16, Tommy Eriksen <te at rackhosting.com<mailto:te at rackhosting.com>> wrote:

Hi guys,

I've got a bit of a ZFS problem:
All of a sudden, and it doesn't seem related to load or anything, the system will stop writing to the disks in my storage pool. No error messages are logged (that I can find anyway), nothing in dmesg, messages or the likes.

ZFS stalls, a simple snapshot command (or the likes) just hangs indefinitely and can't be stopped with ctrl+c or kill -9.

Today, the stall happened after I had been running 2 VMs on each (running on vsphere5 connecting via iscsi) running iozone -s 200G (just to generate a bunch of load). Happily, this morning, I saw that they were still running without problem and stopped them. Then, when asking vSphere to delete the VMs, all write I/O stalled. A bit too much irony for me :)

However, and this puzzled me, everything else seems to run perfectly, even up to zfs writing new data on the l2arc devices while data is read.

Boxes (2 of the same) are:
Supermicro based, 24 bay chassis
2*X5645 Xeon
48gigs of RAM
3*LSI2008 controllers coupled to
20 Seagate Constellation ES 3TB SATA
2 Intel 600GB SSD
2 Intel 311 20GB SSD

18 of the 3TB drives are set up in mirrored vdevs, the last 2 are spares.

Running oi_151a (trying a downgrade to 148 today, I think, since I have 5 or so boxes running without problems on 148, but both my 151a are playing up).

/etc/system variables:
set zfs:zfs_vdev_max_pending = 4
set zfs:l2arc_noprefetch = 0
set zfs:zfs_vdev_cache_size = 0


I can write to a (spare) disk on the same controller without errors, so I take it its not a general I/O stall on the controller:
root at zfsnas3:/var/adm# dd if=/dev/zero of=/dev/rdsk/c8t5000C50035DE14FAd0s0 bs=1M
^C1640+0 records in
1640+0 records out
1719664640 bytes (1.7 GB) copied, 11.131 s, 154 MB/s

iostat reported - note no writes to any of the other drives. All writes just stall.

                         extended device statistics       ---- errors ---
 r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
3631.6  167.2 14505.5 152337.1  0.0  2.2    0.0    0.6   0 157   0   0   0   0 c8
109.0    0.8  472.9    0.0  0.0  0.0    0.0    0.5   0   3   0   0   0   0 c8t5000C50035B922CCd0
143.0    0.8  567.1    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035CA8A5Cd0
89.6    0.8  414.1    0.0  0.0  0.1    0.0    0.6   0   2   0   0   0   0 c8t5000C50035CAB258d0
95.8    0.8  443.3    0.0  0.0  0.0    0.0    0.5   0   2   0   0   0   0 c8t5000C50035DE3DEBd0
144.8    0.8  626.4    0.0  0.0  0.1    0.0    0.6   0   4   0   0   0   0 c8t5000C50035BE1945d0
134.0    0.8  505.7    0.0  0.0  0.0    0.0    0.4   0   3   0   0   0   0 c8t5000C50035DDB02Ed0
 1.0    0.4    3.4    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5000C50035DE0414d0
107.8    0.8  461.6    0.0  0.0  0.0    0.0    0.3   0   2   0   0   0   0 c8t5000C50035D40D15d0
117.2    0.8  516.5    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035DE0C86d0
64.2    0.8  261.2    0.0  0.0  0.0    0.0    0.6   0   2   0   0   0   0 c8t5000C50035DD6044d0
 2.0    0.8    6.8    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5001517959582943d0
 2.0    0.8    6.8    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c8t5001517959582691d0
109.8    0.8  423.5    0.0  0.0  0.0    0.0    0.3   0   2   0   0   0   0 c8t5000C50035C13A6Bd0
765.0    0.8 3070.9    0.0  0.0  0.2    0.0    0.2   0   7   0   0   0   0 c8t5001517959699FE0d0
 1.0  149.2    3.4 152337.1  0.0  1.0    0.0    6.5   0  97   0   0   0   0 c8t5000C50035DE14FAd0
210.4    0.8  775.4    0.0  0.0  0.1    0.0    0.4   0   3   0   0   0   0 c8t5000C50035CA1E58d0
689.4    0.8 2776.6    0.0  0.0  0.1    0.0    0.2   0   7   0   0   0   0 c8t50015179596A8717d0
108.6    0.8  430.5    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035CBD12Ad0
165.6    0.8  561.5    0.0  0.0  0.1    0.0    0.4   0   3   0   0   0   0 c8t5000C50035CA90DDd0
164.4    0.8  578.5    0.0  0.0  0.1    0.0    0.4   0   4   0   0   0   0 c8t5000C50035DDFC34d0
125.6    0.8  477.7    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035DE2AD3d0
93.2    0.8  371.3    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035B94C40d0
113.2    0.8  445.3    0.0  0.0  0.1    0.0    0.5   0   3   0   0   0   0 c8t5000C50035BA02AEd0
75.4    0.8  304.8    0.0  0.0  0.0    0.0    0.4   0   2   0   0   0   0 c8t5000C50035DDA579d0


…Is anyone else seeing similar?

Thanks a lot,
Tommy
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org<mailto:OpenIndiana-discuss at openindiana.org>
http://openindiana.org/mailman/listinfo/openindiana-discuss

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org<mailto:OpenIndiana-discuss at openindiana.org>
http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list