[OpenIndiana-discuss] Pool I/O

Fri May 8 20:40:25 UTC 2015

i have 64 core boxes with 576gb of ram that used to be very very 
squirrelly. We used 8k writes for postgres on the original zpools for 
years and migrated that data from spinning rust to DC S3700s on these 
new 64 core boxes.  The new hardware they would periodically implode due 
to cpu lock contention, metaslab/spacemap fragmentation issues on the 
pool, and or as a contributing factor perhaps a lack of free cells on 
the DC S3700s due to our continuous, non-stopping write load. It seems 
difficult to diagnose these problems and to what extent each issue is 
contributing to the symptoms.

what helped was off loading the data from the pools and migrating it 
back. We havent had a pool brown out since. That said, adjusting the 
pools to write 8k sectors and 128k records improved latency alot in one 
test case versus the legacy 512 sector 8k record configuration. Moving 
everything to 8k sectors and 128k records looks like a big win for us. 
In the absence of trim, it may lower the load on the GC inside the 3700.

maybe some of this will help give you some ideas of what to look at.

j.

On 5/8/15 1:16 PM, Joe Hetrick wrote:
> Tracked it down to about 3 gvfsd-metadata processes, maybe...can't decide if they were victims or root causes.
>
> Shooting those in the head brought things back; I didn't see how our DCS3700's were buried though, what it appeared to me was that pool i/o was effectively blocked; so I don't know whether the DDRdrives would have had any effect.
>
> I would still like to be edumacated on a way to acquire a bit more insight into what the pool was busy waiting for when the spindles were so idle.  I have no doubt NFS was suffering, but, my number of threads was not at max, and the system was relatively idle; I just couldn't get anything written to disk in a timely fashion.
>
> J
>
> On 08 May 13:10, jason matthews wrote:
>>
>>
>> sounds like it is blocking on NFS :-)
>>
>> Ask Chris for a try/buy  DDRdrive X1 or whatever the latest
>> concoction is... it could be life change for you.
>>
>> j.
>>
>> On 5/8/15 11:32 AM, Joe Hetrick wrote:
>>> Today I played a bit with set sync=disabled after watching a few f/s write IOP's.  I can't decide if I've found a particular group of users with a new (more abusive) set of jobs;
>>>
>>> I'm looking more and more, and I've turned sync off on a handful of filesystems that are showing a high number of write I/O, sustained; when those systems are bypassing the ZIL, everything is happy.  The ZIL devices are never in %w, and the pool %b coincides with spindle %b, which is almost never higher than 50 or so; and things are streaming nicely.
>>>
>>> Does anyone have any dtrace that I could use to poke into just what the pool is blocking on when these others are in play?  Looking at nfsv3 operations, I see a very large number of
>>> create
>>> setattr
>>> write
>>> modify
>>> rename
>>>
>>> and sometimes remove
>>> and I'm suspecting these users are doing something silly at HPC scale..
>>>
>>>
>>> Thanks!
>>>
>>> Joe
>>>
>>>
>>>> Hi all,
>>>>
>>>>
>>>> 	We've recently run into a situation where I'm seeing pool at 90-100 %b, and our ZIL's at 90-100 %w, yet all of the spindles are relatively idle.  Furthermore, local I/O is normal, and testing is able to quickly and easily put both pool and spindles in the VDEV into high activity.
>>>>
>>>>      The system is primarily accessed via NFS (home server for an HPC environment).  We've had users to evil things before to cause pain, but, this is most odd, as I would only expect this behavior if we had a faulty device in the pool with high %b (we don't) or if we had some sort of COW related issue; such as being <15% free space or so.  In this case, we are less than half full of a 108TB raidz3 pool.
>>>>
>>>> 	latencytop shows a lot of ZFS ZIL Writer latency, but thats to be expected given what I see above.  Pool I/O with zpool iostat is normal-ish, and as I said, simple raw writes to the pool show expected performance when done locally.
>>>>
>>>> 	Does anyone have any ideas?
>>>>
>>>> Thanks,
>>>>
>>>> Joe
>>>>
>>>> -- 
>>>> Joe Hetrick
>>>> perl -e 'print pack(h*,a6865647279636b604269647a616e69647f627e2e65647a0)'
>>>> BOFH Excuse: doppler effect
>>>>
>>> _______________________________________________
>>> openindiana-discuss mailing list
>>> openindiana-discuss at openindiana.org
>>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>>>
>>
>> _______________________________________________
>> openindiana-discuss mailing list
>> openindiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss