[OpenIndiana-discuss] Mapping devices in OI to physical devices

Sašo Kiselkov skiselkov.ml at gmail.com
Tue Jan 22 18:09:40 UTC 2013


On 01/22/2013 06:26 PM, Len Zaifman wrote:
> We have just had a major system meltdown and it took several days to fix.
> 
> What we would have liked is 2 things we had on thumpers (Old SUN ZFS systems)
> 
> 1) A tool to show the mapping of a solaris device name to a physical location
> 2) A tool to turn on the light on a disk via its solaris device name.
> 
> The process below is too painful, and we have other devices whose disks may go bad. Does either 1 or 2 above exist in openindiana? I could not find it, if it does.
> 
> Thanks.
> 
> The issue was:
> 
> OI (OpenIndiana Development oi_151a X86) reported:
> 
> 
> Jan 22 10:57:43 archivea scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,3408 at 1/pci1000,3040 at 0 (mpt_sas10):
> Jan 22 10:57:43 archivea        Disconnected command timeout for Target 18
> Jan 22 10:57:43 archivea scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,3408 at 1/pci1000,3040 at 0 (mpt_sas10):
> Jan 22 10:57:43 archivea        Log info 0x31140000 received for target 18.
> Jan 22 10:57:43 archivea        scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> 
> zfs performance went through the floor  and was intolerable(< 1 mb/sec where we had hundreds of MB/sec for resilver/scrubs and 100 MB/sec through the filesystem).
> 
> The defective disk was one of 45 disks in a Supermicro Jbod system (SC847E26-RJBOD1)
> 
> We finally found which disk it was by comparing serial numbers reported by iostat, disks that reported errors and the actual disk serial number (we pulled all 45 disks out to do this mapping). we do not want to repeat this process for our other devices.

The things you describe are hardware-specific. If your enclosures are
SES-2 compatible, then the fault manager should automatically blink the
appropriate LED.

You can easily map the affected FRU from a fault report in fmadm, for
example I have one drive right now with a predictive failure:
# fmadm faulty
--------------- ------------------------------------  --------------
---------
TIME            EVENT-ID                              MSG-ID
SEVERITY
--------------- ------------------------------------  --------------
---------
Jan 14 19:11:12 29661ec9-5747-4466-f241-c96ac9f7954f  DISK-8000-0X
Major

Host        : vod1
Platform    : SUN-FIRE-X2250    Chassis_id  : 0948QBN009
Product_sn  :

Fault class : fault.io.disk.predictive-failure
Affects     :
dev:///:devid=id1,sd@n5000c50015ae9c51//pci@0,0/pci8086,4021@1/pci1000,3150@0/sd@9,0
                  faulted but still in service
FRU         : "SCSI Device  9"
(hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0)
                  faulty

Description : SMART health-monitoring firmware reported that a disk
              failure is imminent.
              Refer to http://sun.com/msg/DISK-8000-0X for more information.

Now we can take the FRU ID and find out which logical drive it
corresponds to.

# /usr/lib/fm/fmd/fmtopo -V
'hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0'
... [snip] ...
    logical-disk      string    c7t9d0    <<< here's the logical ID
    manufacturer      string    SEAGATE
    model             string    ST31000NSSUN1.0T 093354VY4X
    serial-number     string    9QJ4VY4X
    firmware-revision string    SU0D
    capacity-in-bytes string    1000204886016
    target-port-l0ids string[]  [ "w5001636000207501" ]
... [snip] ...

If you don't know your FRU, just run /usr/lib/fm/fmd/fmtopo without any
arguments, it'll print out the FRUs for all the machine components it knows.

If you are running a recent LSI HBA, you can also install the sas2ircu
and diskmap.py utilities which will map out your physical infrastructure
and tell you what lies where:

# diskmap.py
Diskmap - npvr1> help

Documented commands (type help <topic>):
========================================
EOF    controllers  disks       enclosures  ledon   quit     sd_timeout
alias  discover     drawletter  ledoff      mangle  refresh

Diskmap - npvr1> disks
1:02:00    c8t50000393E8CAF2A4d0        MK2001TRKB    2.0T  Ready (RDY)
content: raidz1-0
1:02:01    c8t50000393E8CAF53Cd0        MK2001TRKB    2.0T  Ready (RDY)
content: raidz1-1
... [snip] ...

The first column is your <ctrl>:<enclosureid>:<drivenumber> ID. The
"ledon" and "ledoff" control LED blinking.
See https://github.com/swacquie/DiskMap for more info.

Hope this helps.

Cheers,
--
Saso



More information about the OpenIndiana-discuss mailing list