[OpenIndiana-discuss] MPT SGL mem alloc failed

Rich rercola at acm.jhu.edu
Mon Jul 9 20:26:07 UTC 2012


I've got a number of mpt_sas-using Supermicro-hardware-running OI
machines, and have never seen that error, so I'm impressed.

That said, I don't think I'd call that "plenty of memory", depending
on your dataset size. How many disks and how large are the pools? It's
quite possible to eat up 24 GB very quickly with enough disk IO
(assuming e.g. 100 MB/s IO, perfect transfer efficiency, and no other
users of your RAM, it'd take 240 disks writing in parallel - in
practice, I would not be surprised if half that were sufficient).

- Rich

On Mon, Jul 9, 2012 at 4:17 PM, Timothy Coalson <tsc5yc at mst.edu> wrote:
> I upgraded a machine to oi_151a5 from oi_151a4 last week, and when its
> weekly scrub rolled around, /var/adm/messages gathered a lot of these,
> in groups of dozens at a time:
>
> Jul  7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
> ,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
> Jul  7 01:15:21 myelin2         Unable to allocate dma memory for extra SGL.
> Jul  7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
> ,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
> Jul  7 01:15:21 myelin2         MPT SGL mem alloc failed
>
> And zpool status showed a lot of failed reads, and decided to drop all
> the disks on one of the two HBAs.  Under oi_151a4, I am fairly certain
> these messages did not show up (there are none in /var/adm/messages.*,
> which has entries from June 11, the upgrade was on July 3).  A zpool
> clear later, and it accepted the disks again, and the resilver didn't
> need to correct much, but another scrub caused the same problem again.
>  It is running a pair of LSI 9201-16i HBAs connected via fanout cables
> to SATA disks, and appears to have plenty of free memory:
>
> tim at myelin2:~$ echo ::memstat | sudo mdb -k
> Page Summary                Pages                MB  %Tot
> ------------     ----------------  ----------------  ----
> Kernel                    4553408             17786   72%
> ZFS File Data              193505               755    3%
> Anon                       108432               423    2%
> Exec and libs                1351                 5    0%
> Page cache                   5734                22    0%
> Free (cachelist)            22007                85    0%
> Free (freelist)           1402689              5479   22%
>
> Total                     6287126             24559
> Physical                  6287125             24559
>
> The errors in /var/adm/messages continue to show up, even while not
> scrubbing, though less often, but zfs only seems to see problems when
> scrubbing (or possibly any heavy IO load, but this machine doesn't get
> much of that otherwise).  Any ideas on chasing this down?  Otherwise,
> I plan to boot it back into 151a4 and try and reproduce the problem,
> to check if the update is to blame.
>
> Tim
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list