[OpenIndiana-discuss] MPT SGL mem alloc failed
Rich
rercola at acm.jhu.edu
Mon Jul 9 20:26:07 UTC 2012
I've got a number of mpt_sas-using Supermicro-hardware-running OI
machines, and have never seen that error, so I'm impressed.
That said, I don't think I'd call that "plenty of memory", depending
on your dataset size. How many disks and how large are the pools? It's
quite possible to eat up 24 GB very quickly with enough disk IO
(assuming e.g. 100 MB/s IO, perfect transfer efficiency, and no other
users of your RAM, it'd take 240 disks writing in parallel - in
practice, I would not be surprised if half that were sufficient).
- Rich
On Mon, Jul 9, 2012 at 4:17 PM, Timothy Coalson <tsc5yc at mst.edu> wrote:
> I upgraded a machine to oi_151a5 from oi_151a4 last week, and when its
> weekly scrub rolled around, /var/adm/messages gathered a lot of these,
> in groups of dozens at a time:
>
> Jul 7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
> ,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
> Jul 7 01:15:21 myelin2 Unable to allocate dma memory for extra SGL.
> Jul 7 01:15:21 myelin2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086
> ,340a at 3/pci1000,30c0 at 0 (mpt_sas0):
> Jul 7 01:15:21 myelin2 MPT SGL mem alloc failed
>
> And zpool status showed a lot of failed reads, and decided to drop all
> the disks on one of the two HBAs. Under oi_151a4, I am fairly certain
> these messages did not show up (there are none in /var/adm/messages.*,
> which has entries from June 11, the upgrade was on July 3). A zpool
> clear later, and it accepted the disks again, and the resilver didn't
> need to correct much, but another scrub caused the same problem again.
> It is running a pair of LSI 9201-16i HBAs connected via fanout cables
> to SATA disks, and appears to have plenty of free memory:
>
> tim at myelin2:~$ echo ::memstat | sudo mdb -k
> Page Summary Pages MB %Tot
> ------------ ---------------- ---------------- ----
> Kernel 4553408 17786 72%
> ZFS File Data 193505 755 3%
> Anon 108432 423 2%
> Exec and libs 1351 5 0%
> Page cache 5734 22 0%
> Free (cachelist) 22007 85 0%
> Free (freelist) 1402689 5479 22%
>
> Total 6287126 24559
> Physical 6287125 24559
>
> The errors in /var/adm/messages continue to show up, even while not
> scrubbing, though less often, but zfs only seems to see problems when
> scrubbing (or possibly any heavy IO load, but this machine doesn't get
> much of that otherwise). Any ideas on chasing this down? Otherwise,
> I plan to boot it back into 151a4 and try and reproduce the problem,
> to check if the update is to blame.
>
> Tim
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list