[OpenIndiana-discuss] mask pci-e errors

Mon Jan 13 22:14:45 UTC 2014

Hi there,

I was having a problem with an intermittent error with the PCI express x16 slot in my desktop. Apparently it would register some intermittent failure and FMD would hop in and remove my video card from service, crashing X.

I solved it by adding the following line to /kernel/drv/pcieb.conf

pcie_ce_mask=-1;

This line prevents these errors being picked up by FMD.

After you add it, use update_drv -vf pcieb to reload the driver config. Restart and you should be good to go.

Hope this helps.
   Bryan

jason matthews <jason at broken.net> wrote:

>
>
>I have 40 identically configured systems that catch the pci-e error
>below. It seems that about every six months plus or minus, they go
>through a cycle where they generate this error usually all forty within
>about three weeks and they are good for months. Bad juju.
>
>The systems are Intel SR2625URLXR, 9207-8i, Intel 910, and 9205-8e on
>L5630 CPUs with 96gb of ram. The result of the failure is that zfs and
>zpool commands commands hang on the intel 910 card. Regular file system
>disk I/O is okay, but zpool and zfs commands hang.
>
>I am looking for a work around as  the storage continues to work for
>applications despite the error. Perhaps the error could be masked
>before FMD takes action? Maybe ZFS gets internally hosed before FMD
>takes action, I don't know. The hang up seems to be in zfs where system
>thinks the storage is hosed and zfs/zpool commands hang. As I say
>regular file system I/Os work just peachy. Does anyone have any ideas
>on how to overcome this problem without rebooting?
>
>I use clones of file systems to stand up short lived databases to run
>long batch queries against and when this happens i tend to have fairly
>crappy work day satisfaction.
>
>Perhaps this is related to:
>https://www.illumos.org/issues/315
>
>http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/mostViewedDisplay?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c03652921-1%257CdocLocale%253Den_US&javax.portlet.tpst=efb5c0793523e51970c8fa22b053ce01&sp4ts.oid=4091412&ac.admitted=1389635734908.876444892.492883150
>
>It seems Oracle may have patched similar issues.
>thanks,
>j.
>
>
>root at db020:~# fmadm faulty -ai
>--------------- ------------------------------------  --------------
>---------
>TIME            CACHE-ID                              MSG-ID
>SEVERITY
>--------------- ------------------------------------  --------------
>---------
>Jan 08 13:47:15 2a74a865-ba4e-c3b0-e437-e0e34ba53623  PCIEX-8000-0A
>Critical
>
>Host        : db020
>Platform    : S5520UR   Chassis_id  : ............
>Product_sn  :
>
>Fault class : fault.io.pciex.device-interr
>Affects     :
>dev:////pci@0,0/pci8086,340c@5/pci111d,806a@0/pci111d,806a@4/pci1000,3020@0
>                  faulted and taken out of service
>FRU         : "FH PCIE-SLOT2 x8"
>(hc://:product-id=S5520UR:server-id=db020:chassis-id=............/motherboard=0/hostbridge=2/pciexrc=2/pciexbus=4/pciexdev=0)
>                  faulty
>
>Description : A problem was detected for a PCIEX device.
>        Refer to http://sun.com/msg/PCIEX-8000-0A for more information.
>
>Response    : One or more device instances may be disabled
>
>Impact      : Loss of services provided by the device instances
>associated with
>              this fault
>
>Action      : Schedule a repair procedure to replace the affected
>device.  Use
>        fmadm faulty to identify the device or contact Sun for support.
>
>
>_______________________________________________
>OpenIndiana-discuss mailing list
>OpenIndiana-discuss at openindiana.org
>http://openindiana.org/mailman/listinfo/openindiana-discuss

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.