[illumos] [OpenIndiana Distribution - Bug #4064] oi_151a7 ZFS deduplication problem

Thu Aug 22 00:24:15 UTC 2013

Issue #4064 has been updated by Ken Mays.

Assignee set to OI illumos

----------------------------------------
Bug #4064: oi_151a7 ZFS deduplication problem
https://www.illumos.org/issues/4064

Author: Stephen Rondeau
Status: New
Priority: Normal
Assignee: OI illumos
Category: OS/Net (Kernel and Userland)
Target version: oi_151_stable
Difficulty: Medium
Tags: needs-triage

My storage server running oi_151a7 has 16GB of RAM.

> *zpool status data1*
<pre>

  pool: data1
 state: ONLINE
  scan: scrub repaired 0 in 1h37m with 0 errors on Tue Aug 20 07:09:55 2013
config:

        NAME        STATE     READ WRITE CKSUM
        data1       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c4t2d0  ONLINE       0     0     0
            c4t3d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0

errors: No known data errors
</pre>

I enabled deduplication on my zpool for a volume called data1/itfiles/home, which defined to be 900GB. Via iSCSI, this volume is attached to a Windows Server 2008 file server.

>From the file server I attempted to copy about 100GB of data onto that remote volume.  Most of it deduped fine, with a dedupratio of 2.30x. However, copying three fairly-deep directories caused the volume to detach from the file server. I was able to reproduce the volume detaching behavior by attempting to copy those directories again. If dedup is turned off for the volume, there is no problem with volumes detaching using those same directories. I tried copying the directories to a different ZFS volume with dedup on and off, with the same effect. Hence, I think there is a problem with deduping.

I tried to trim the test case down, but it seems to take several 100's of MB before the problem occurs, and it doesn't consistently fail with smaller test cases.

I don't appear to have run out of RAM for the DDTs. I have no way of clearing the DDTs, so I can't start over with a clean system without destroying and re-creating my pool. I have turned off dedup on all volumes for now.

The only unusual thing I could find with that RAIDZ1 pool was that the I/O counts seemed uneven -- I thought they would be about the same across all RAID disks:

> *zpool iostat -v data1*
<pre>

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data1        321G  3.31T     35    298   188K  1.45M
  raidz1     321G  3.31T     35    298   188K  1.45M
    c4t2d0      -      -      9      7  77.6K   738K
    c4t3d0      -      -      6      6  46.0K   610K
    c4t4d0      -      -      9      7  76.8K   738K
    c4t5d0      -      -      6      5  46.8K   601K
----------  -----  -----  -----  -----  -----  -----
</pre>

So, I am wondering if I have some kind of disk error on c4t3d0 and c4t5d0. I don't know how to diagnose that. I tried smartmontools, but it did not yield any useful info.

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://www.illumos.org/my/account