[OpenIndiana-discuss] OI vs U9 performance surprises

Doug Hughes doug at will.to
Mon May 9 20:57:58 UTC 2011

On 5/9/2011 12:01 PM, Doug Hughes wrote:
> Box = Sun x4240 with 8xIntel 320 160GB flash
>             R6x6 SSD oi R6x6 SSD u9 R5x5 SSD u9     R5x6u9  R6 ssd oi/12
> 100k        4m 37s      3m 31s      3m 24s          3m 27s  4m 18s
> rm 100k     1m 39s      1m 3s       1m              1m      1m 25s
> star-x      3m 5s       3m 37s      2m 37s          2m 43s  2m 53s
> cp b.tar    2m 14s      2m 18s      2m 8s           2m 9s   2m 1s
> dual touch              3m 41s      3m 35s          3m 43s  4m 24s
> dual rm                 1m 4s       1m 2s           1m 4s   1m 25s
> dual star -x            2.5/5m      3.3m 4.5m       3m/4m   3.5m 4.5m
> d cp 12.7G              3m 45s      3m 45s          4m      3m 45s
> SSD = Intel 320 160GB drives
> R6 = 6 drives in raid6 config
> u9  = Solaris 10 update 9
> R5x5 = Intel 320 160GB in raid5 with 5 disks
> oi= OpenIndiana 148 release
> oi/12 = OpenIndiana 148 release with zpool-12 patch for ashift=12
> 100k = 100k files: 1000 directories, 100 files each, using touch (0 
> byte files - metadata/ops intensive)
> rm 100k = removing all files and directories
> star -x = extracting (over NFS, 1gig) a 12.7GB file with a mix of 
> small and large file
> cp b.tar = copying 12.7GB tar file over NFS to server (bounded  by wire)
> dual touch = touch 100k files over NFS on dedicated 1gig links to server
> dual rm = removing those files
> dual cp = copying the 12.7GB tar file from the node (bounded by 1gbit 
> wire again)
> The odd thing is that, aside from the things that are bounded by 1gbit 
> wire, that u9 is quite a bit faster than OpenIndiana. I have 128 NFS 
> servers defined, and they never seem to be bottlenecked.
> I would post the numbers for 73GB spinning media, but the run times 
> for OpenIndiana were unboundedly slow, and I hadn't been able to 
> figure that out (hours later, I had to ctrl-c vs. minutes on U9).
> when I ran dstrace ops script against the 73GB disks it was taking 
> hundreds of mseconds for ops to complete.  But, since I haven't posted 
> the performance numbers of the 73GB disks here (will re-do benchmarks 
> in a bit and repost if asked), I guess we could focus on why OI 
> appears to be significantly slower. ashift=12 improves things by a 
> small amount, but still not to the point of u9 speed. Advice/tips 
> welcome. This is a relatively new/virgin OI-148 install.
> (one note, the double numbers often include 2 numbers. It appears that 
> the using dladm on the second interface to create an aggregate has a 
> signficant impact on transfer speed/latency, but we can ignore that 
> for now)

Here are some dtrace ops numbers for the server side (linux client, same 
client for both) with open indiana:
hughesd at solaris:/var/tmp/mravi# bin/nfs-ops-response.d
dtrace: script 'bin/nfs-ops-response.d' matched 3 probes
   1 | :END

NFS v3 Ops count
RFS3_FSSTAT                       2
RFS3_MKDIR                     1000
RFS3_GETATTR                   1001
RFS3_ACCESS                    1006
RFS3_CREATE                  101000
RFS3_SETATTR                 101000
RFS3_LOOKUP                  101003

NFS v3 Ops average RESPONSE time (usec)
RFS3_GETATTR                     39
RFS3_LOOKUP                      46
RFS3_ACCESS                      52
RFS3_FSSTAT                     304
RFS3_SETATTR                    449
RFS3_CREATE                     508
RFS3_MKDIR                      667

NFS v3 Ops average SYSTEM time (usec)
RFS3_GETATTR                     32
RFS3_ACCESS                      40
RFS3_LOOKUP                      40
RFS3_FSSTAT                      80
RFS3_SETATTR                    143
RFS3_CREATE                     200
RFS3_MKDIR                      340

here are numbers with Sol 10 U9 (same server hardware, same client)
6 disk raidz1:

dtrace: script '/var/tmp/mravi/bin/nfs-ops-response.d' matched 3 probes
   5 | :END

NFS v3 Ops count
RFS3_GETATTR                   1000
RFS3_MKDIR                     1000
RFS3_ACCESS                    1004
RFS3_CREATE                  101000
RFS3_LOOKUP                  101000
RFS3_SETATTR                 101000

NFS v3 Ops average RESPONSE time (usec)
RFS3_LOOKUP                      17
RFS3_GETATTR                     18
RFS3_ACCESS                      21
RFS3_SETATTR                    270
RFS3_CREATE                     294
RFS3_MKDIR                      340

NFS v3 Ops average SYSTEM time (usec)
RFS3_GETATTR                     16
RFS3_LOOKUP                      16
RFS3_ACCESS                      18
RFS3_SETATTR                     54
RFS3_CREATE                      75
RFS3_MKDIR                      124

Here's the dtrace script
#!/usr/sbin/dtrace -FCs
# Author: Ravi Mallarapu
# Date: 20101117
# Description: display NFS v3 ops count and average user and system 
response time

/args[0]->rq_vers == 3 && ! self->trace/
         self->trace = 1;
         this->proc = args[0]->rq_proc;
         self->pname = (this->proc >= 0 && this->proc <= 21) ?
                 stringof (nfssrv`rfscallnames_v3[this->proc]) :
                 stringof ("invalid proc");
         self->time = timestamp;
         self->vtime = vtimestamp;
         @opcount[self->pname] = count();

         this->usec = (timestamp - self->time) / 1000;
         this->systm = (vtimestamp - self->vtime) / 1000;
         @avgtime[self->pname] = avg(this->usec);
         @avgsystm[self->pname] = avg(this->systm);
         self->pname = 0;
         self->trace = 0;

         printf("\n\nNFS v3 Ops count\n");
         printf(    "==============\n");
         printa("%-20s   %@12d\n", @opcount);

         printf("\n\nNFS v3 Ops average RESPONSE time (usec)\n");
         printf(    "================================\n");
         printa("%-20s   %@12d\n", @avgtime);

         printf("\n\nNFS v3 Ops average SYSTEM time (usec)\n");
         printf(    "==============================\n");
         printa("%-20s   %@12d\n", @avgsystm);

The u9 operations seems to be better optimized across the range, 
regardless of same hardware.  I received one bit of offline advise 
advising to check again with illumos due out soon, but I really expected 
things to be slightly better or at least even keel when I ran the tests.

More information about the OpenIndiana-discuss mailing list