[OpenIndiana-discuss] MPT SGL mem alloc failed

Timothy Coalson tsc5yc at mst.edu
Mon Jul 9 20:17:28 UTC 2012


I upgraded a machine to oi_151a5 from oi_151a4 last week, and when its
weekly scrub rolled around, /var/adm/messages gathered a lot of these,
in groups of dozens at a time:

Jul  7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
Jul  7 01:15:21 myelin2         Unable to allocate dma memory for extra SGL.
Jul  7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
Jul  7 01:15:21 myelin2         MPT SGL mem alloc failed

And zpool status showed a lot of failed reads, and decided to drop all
the disks on one of the two HBAs.  Under oi_151a4, I am fairly certain
these messages did not show up (there are none in /var/adm/messages.*,
which has entries from June 11, the upgrade was on July 3).  A zpool
clear later, and it accepted the disks again, and the resilver didn't
need to correct much, but another scrub caused the same problem again.
 It is running a pair of LSI 9201-16i HBAs connected via fanout cables
to SATA disks, and appears to have plenty of free memory:

tim at myelin2:~$ echo ::memstat | sudo mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    4553408             17786   72%
ZFS File Data              193505               755    3%
Anon                       108432               423    2%
Exec and libs                1351                 5    0%
Page cache                   5734                22    0%
Free (cachelist)            22007                85    0%
Free (freelist)           1402689              5479   22%

Total                     6287126             24559
Physical                  6287125             24559

The errors in /var/adm/messages continue to show up, even while not
scrubbing, though less often, but zfs only seems to see problems when
scrubbing (or possibly any heavy IO load, but this machine doesn't get
much of that otherwise).  Any ideas on chasing this down?  Otherwise,
I plan to boot it back into 151a4 and try and reproduce the problem,
to check if the update is to blame.

Tim



More information about the OpenIndiana-discuss mailing list