[OpenIndiana-discuss] Mapping devices in OI to physical devices
Sašo Kiselkov
skiselkov.ml at gmail.com
Tue Jan 22 18:09:40 UTC 2013
On 01/22/2013 06:26 PM, Len Zaifman wrote:
> We have just had a major system meltdown and it took several days to fix.
>
> What we would have liked is 2 things we had on thumpers (Old SUN ZFS systems)
>
> 1) A tool to show the mapping of a solaris device name to a physical location
> 2) A tool to turn on the light on a disk via its solaris device name.
>
> The process below is too painful, and we have other devices whose disks may go bad. Does either 1 or 2 above exist in openindiana? I could not find it, if it does.
>
> Thanks.
>
> The issue was:
>
> OI (OpenIndiana Development oi_151a X86) reported:
>
>
> Jan 22 10:57:43 archivea scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,3408 at 1/pci1000,3040 at 0 (mpt_sas10):
> Jan 22 10:57:43 archivea Disconnected command timeout for Target 18
> Jan 22 10:57:43 archivea scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,3408 at 1/pci1000,3040 at 0 (mpt_sas10):
> Jan 22 10:57:43 archivea Log info 0x31140000 received for target 18.
> Jan 22 10:57:43 archivea scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
>
> zfs performance went through the floor and was intolerable(< 1 mb/sec where we had hundreds of MB/sec for resilver/scrubs and 100 MB/sec through the filesystem).
>
> The defective disk was one of 45 disks in a Supermicro Jbod system (SC847E26-RJBOD1)
>
> We finally found which disk it was by comparing serial numbers reported by iostat, disks that reported errors and the actual disk serial number (we pulled all 45 disks out to do this mapping). we do not want to repeat this process for our other devices.
The things you describe are hardware-specific. If your enclosures are
SES-2 compatible, then the fault manager should automatically blink the
appropriate LED.
You can easily map the affected FRU from a fault report in fmadm, for
example I have one drive right now with a predictive failure:
# fmadm faulty
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID
SEVERITY
--------------- ------------------------------------ --------------
---------
Jan 14 19:11:12 29661ec9-5747-4466-f241-c96ac9f7954f DISK-8000-0X
Major
Host : vod1
Platform : SUN-FIRE-X2250 Chassis_id : 0948QBN009
Product_sn :
Fault class : fault.io.disk.predictive-failure
Affects :
dev:///:devid=id1,sd@n5000c50015ae9c51//pci@0,0/pci8086,4021@1/pci1000,3150@0/sd@9,0
faulted but still in service
FRU : "SCSI Device 9"
(hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0)
faulty
Description : SMART health-monitoring firmware reported that a disk
failure is imminent.
Refer to http://sun.com/msg/DISK-8000-0X for more information.
Now we can take the FRU ID and find out which logical drive it
corresponds to.
# /usr/lib/fm/fmd/fmtopo -V
'hc://:product-id=SUN-Storage-J4200:server-id=:chassis-id=0946QGJ007:serial=9QJ4VY4X:part=SEAGATE-ST31000NSSUN1.0T-093354VY4X:revision=SU0D/ses-enclosure=0/bay=9/disk=0'
... [snip] ...
logical-disk string c7t9d0 <<< here's the logical ID
manufacturer string SEAGATE
model string ST31000NSSUN1.0T 093354VY4X
serial-number string 9QJ4VY4X
firmware-revision string SU0D
capacity-in-bytes string 1000204886016
target-port-l0ids string[] [ "w5001636000207501" ]
... [snip] ...
If you don't know your FRU, just run /usr/lib/fm/fmd/fmtopo without any
arguments, it'll print out the FRUs for all the machine components it knows.
If you are running a recent LSI HBA, you can also install the sas2ircu
and diskmap.py utilities which will map out your physical infrastructure
and tell you what lies where:
# diskmap.py
Diskmap - npvr1> help
Documented commands (type help <topic>):
========================================
EOF controllers disks enclosures ledon quit sd_timeout
alias discover drawletter ledoff mangle refresh
Diskmap - npvr1> disks
1:02:00 c8t50000393E8CAF2A4d0 MK2001TRKB 2.0T Ready (RDY)
content: raidz1-0
1:02:01 c8t50000393E8CAF53Cd0 MK2001TRKB 2.0T Ready (RDY)
content: raidz1-1
... [snip] ...
The first column is your <ctrl>:<enclosureid>:<drivenumber> ID. The
"ledon" and "ledoff" control LED blinking.
See https://github.com/swacquie/DiskMap for more info.
Hope this helps.
Cheers,
--
Saso
More information about the OpenIndiana-discuss
mailing list