[OpenIndiana-discuss] identifying which I/O device has been retired

Stephan Althaus Stephan.Althaus at Duedinghausen.eu
Mon Mar 22 08:16:28 UTC 2021


On 03/22/21 06:46 AM, Tim Mooney via openindiana-discuss wrote:
>
> When I boot my OI workstation (updated to 3/21/2021), I get the message
>
>     NOTICE: One or more I/O devices have been retired
>
> All the documentation I've found says to look at the output
> from prtconf and 'fmadm faulty' to identify which device is the problem.
>
> However, on my workstation:
>
> # prtconf | egrep -i retire
> # fmadm faulty
> # fmadm faulty -a -v
> # svcs -a | egrep -i fm
> STATE          STIME    FMRI
> disabled       20:56:19 svc:/system/fm/notify-params:default
> online         20:56:49 svc:/system/fmd:default
>
> # fmadm config
> MODULE                   VERSION STATUS  DESCRIPTION
> cpumem-retire            1.1     active  CPU/Memory Retire Agent
> disk-lights              1.0     active  Disk Lights Agent
> disk-transport           1.1     active  Disk Transport Agent
> eft                      1.16    active  eft diagnosis engine
> ext-event-transport      0.2     active  External FM event transport
> fabric-xlate             1.0     active  Fabric Ereport Translater
> fmd-self-diagnosis       1.0     active  Fault Manager Self-Diagnosis
> io-retire                2.0     active  I/O Retire Agent
> sensor-transport         1.1     active  Sensor Transport Agent
> ses-log-transport        1.0     active  SES Log Transport Agent
> software-diagnosis       0.1     active  Software Diagnosis engine
> software-response        0.1     active  Software Response Agent
> sysevent-transport       1.0     active  SysEvent Transport Agent
> syslog-msgs              1.1     active  Syslog Messaging Agent
> zfs-diagnosis            1.0     active  ZFS Diagnosis Engine
> zfs-retire               1.0     active  ZFS Retire Agent
>
> # uname -v
> illumos-88a8a2ff32
>
>
>
> Any suggestions for what I should do to identify the source of this 
> issue?
>
> Thanks,
>
> Tim

Hi!

On my system the prtconf shows the retired devices,
so the commands you use seem right.

Maybe you could try the "device driver utitlity"  ddu ...

$ prtconf|grep reti
         pci8086,a114 (retired)
             pci8086,15da (retired)
                 pci8086,15da (retired)
                 pci8086,15da (retired)
                 pci8086,15da (retired)
                     pci1028,7b1 (retired)

$ dmesg|grep reti
Mar 22 06:59:52 dell6510 genunix: [ID 888150 kern.warning] WARNING: 
Device not found in device tree. Skipping device unretire: 
/pci at 0,0/pci8086,a114 at 1c,4/pci8086,15da at 0/pci8086,15da at 2/pci1028,7b1 at 0/storage at 3/disk at 0,0

$ grep reti /var/adm/messages
Mar 22 06:59:41 dell6510 genunix: [ID 751201 kern.notice] NOTICE: One or 
more I/O devices have been retired
Mar 22 06:59:52 dell6510 genunix: [ID 888150 kern.warning] WARNING: 
Device not found in device tree. Skipping device unretire: 
/pci at 0,0/pci8086,a114 at 1c,4/pci8086,15da at 0/pci8086,15da at 2/pci1028,7b1 at 0/storage at 3/disk at 0,0

$ sudo fmadm faulty
Password:
--------------- ------------------------------------ -------------- 
---------
TIME            EVENT-ID MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- 
---------
Jan 12 17:23:25 9e4acbcd-4015-c8a6-f81e-ff479d7690cf PCIEX-8000-DJ  Major

Host        : dell6510
Platform    : Precision-7720    Chassis_id  : 49JT5M2
Product_sn  :

Fault class : fault.io.pciex.device-noresp max 18%
               fault.io.pciex.device-interr max 18%
               fault.io.pciex.bus-noresp max 9%
Affects     : 
dev:////pci@0,0/pci8086,a114/pci8086,15da/pci8086,15da/pci1028,7b1@0
               dev:////pci@0,0/pci8086,a114/pci8086,15da@0
               dev:////pci@0,0/pci8086,a114@1c,4
                   faulted and taken out of service
FRU         : "MB" 
(hc://:product-id=Precision-7720:server-id=dell6510:chassis-id=49JT5M2/motherboard=0) 
max 18%
                   faulty

Description : A problem has been detected on one of the specified 
devices or on
               one of the specified connecting buses.
               Refer to http://illumos.org/msg/PCIEX-8000-DJ for more
               information.

Response    : One or more device instances may be disabled

Impact      : Loss of services provided by the device instances 
associated with
               this fault

Action      : If a plug-in card is involved check for badly-seated cards or
               bent pins. Otherwise schedule a repair procedure to 
replace the
               affected device(s).  Use fmadm faulty to identify the 
devices or
               contact your illumos distribution team for support.




More information about the openindiana-discuss mailing list