[OpenIndiana-discuss] Errors without errors
Michelle
michelle at msknight.com
Thu Aug 5 08:11:53 UTC 2021
I removed the drive in order to a backup before I start messing around
with things, which is why it isn't in the iostat. The backup will take
probably until early evening.
This is what happened from messages around that time. Almost looks like
whatever happened, it rebooted.
Aug 5 01:55:01 jaguar smbd[601]: [ID 617204 daemon.error] Can't get
SID for ID=0 type=1, status=-9977
Aug 5 01:58:00 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 01:58:00 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 01:58:00 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 01:58:00 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 01:58:09 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 01:58:09 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 01:58:09 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 01:58:09 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:15 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:15 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:15 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:16 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:20 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:20 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:20 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:20 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:24 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:24 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:24 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:24 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:24 jaguar ahci: [ID 811322 kern.info] NOTICE: ahci0:
ahci_tran_reset_dport port 3 reset device
Aug 5 02:00:29 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:29 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:29 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:29 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:34 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:34 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:34 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:34 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:38 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:38 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:38 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:38 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:53 jaguar fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-
8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 5 02:00:53 jaguar EVENT-TIME: Thu Aug 5 02:00:53 UTC 2021
Aug 5 02:00:53 jaguar PLATFORM: ProLiant-MicroServer, CSN: 5C7351P4L9,
HOSTNAME: jaguar
Aug 5 02:00:53 jaguar SOURCE: zfs-diagnosis, REV: 1.0
On Thu, 2021-08-05 at 11:03 +0300, Toomas Soome via openindiana-discuss
wrote:
> > On 5. Aug 2021, at 10:52, Michelle <michelle at msknight.com> wrote:
> >
> > Thanks for this. So I'm possibly better off rolling back the OS
> > snapshot after my backup has finished?
>
> maybe, maybe not. first of all, I have no idea to what point the
> rollback would be.
>
> secondly; the system has seen some errors, at this time, the fault
> is, it does not tell us if those were checksum errors or something
> else, and it seems to me, it is something else.
>
> and this is why: if you look on your zpool output, you see report
> about c6t3d0, but iostat -En below, it does not include c6t3d0. It
> seems to be missing.
>
> what do you get from: 'iostat -En c6t3d0’ ?
>
> Also, it would be good idea to check /var/adm/messages, are there any
> SATA or IO related messages around august 05. 02:00?
>
> FMA definitely has recorded an issue about pool, so there must be
> something going on.
>
> rgds,
> toomas
>
> > I have removed the drive for the moment, and am running a backup.
> > Just
> > in case :-)
> >
> > mich at jaguar:~$ iostat -En
> > c5d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> > Model: INTEL SSDSA2M04 Revision: Serial No: CVGB949301PC040
> > Size: 40.02GB <40019116032 bytes>
> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> > Illegal Request: 0
> > c6t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> > Vendor: ATA Product: WDC WD40EZRZ-00G Revision: 0A80 Serial
> > No:
> > WD-WCC7K5UK24LJ
> > Size: 4000.79GB <4000787030016 bytes>
> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> > Illegal Request: 0 Predictive Failure Analysis: 0
> > c6t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> > Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial
> > No:
> > WD-WX21DA84EH0F
> > Size: 6001.18GB <6001175126016 bytes>
> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> > Illegal Request: 0 Predictive Failure Analysis: 0
> > c6t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
> > Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial
> > No:
> > WD-WX51DB880RJ4
> > Size: 6001.18GB <6001175126016 bytes>
> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
> > Illegal Request: 0 Predictive Failure Analysis: 0
> >
> >
> > --------------- ------------------------------------ -------------
> > - --
> > -------
> > TIME EVENT-ID MSG-
> > ID SEVERITY
> > --------------- ------------------------------------ -------------
> > - --
> > -------
> > Aug 05 02:00:53 c5934fd6-5f4b-409e-b0f8-8f44ea8f99c4 ZFS-8000-
> > FD Major
> >
> > Host : jaguar
> > Platform : ProLiant-MicroServer Chassis_id : 5C7351P4L9
> > Product_sn :
> >
> > Fault class : fault.fs.zfs.vdev.io
> > Affects : zfs://pool=jaguar/vdev=740c01ae0d3c3109
> > faulted and taken out of service
> > Problem in : zfs://pool=jaguar/vdev=740c01ae0d3c3109
> > faulted and taken out of service
> >
> > Description : The number of I/O errors associated with a ZFS device
> > exceeded
> > acceptable levels. Refer to
> > http://illumos.org/msg/ZFS-8000-FD for more
> > information.
> >
> > Response : The device has been offlined and marked as
> > faulted. An
> > attempt
> > will be made to activate a hot spare if
> > available.
> >
> > Impact : Fault tolerance of the pool may be compromised.
> >
> > Action : Run 'zpool status -x' and replace the bad device.
> >
> >
> >
> > On Thu, 2021-08-05 at 10:22 +0300, Toomas Soome via openindiana-
> > discuss
> > wrote:
> > > > On 5. Aug 2021, at 09:35, Michelle <michelle at msknight.com>
> > > > wrote:
> > > >
> > > > Hi Folks,
> > > >
> > > > About a month ago I updated my Hipster...
> > > > SunOS jaguar 5.11 illumos-ca706442e6 i86pc i386 i86pc
> > > >
> > > > This morning it was absolutely crawling. Couldn't even connect
> > > > via
> > > > SSH
> > > > and had to bounce the box.
> > > >
> > > > It was reporting a drive as faulted, but didn't give any
> > > > numbers...
> > > > everything was 0. I'm now not sure what happened and whether
> > > > the
> > > > drive
> > > > is good, or whether I should roll back the OS.
> > > >
> > > > (and the drive WD Red 6TB (not shingle) went out of warrantee a
> > > > week
> > > > ago. How about that, eh?)
> > > >
> > > > Grateful for any opinions please.
> > > >
> > > > Thu 5 Aug 04:00:01 UTC 2021
> > > > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DED
> > > > UP
> > > > HE
> > > > ALTH ALTROOT
> > > > lion 5.45T 5.28T 176G - - 4% 96% 1.0
> > > > 0x
> > > > DEGR
> > > > ADED -
> > > > pool: jaguar
> > > > state: DEGRADED
> > > > status: One or more devices are faulted in response to
> > > > persistent
> > > > errors.
> > > > Sufficient replicas exist for the pool to continue
> > > > functioning
> > > > in a
> > > > degraded state.
> > > > action: Replace the faulted device, or use 'zpool clear' to
> > > > mark
> > > > the
> > > > device
> > > > repaired.
> > > > scan: scrub in progress since Thu Aug 5 00:00:00 2021
> > > > 6.00T scanned at 428M/s, 5.02T issued at 358M/s, 7.90T
> > > > total
> > > > 1M repaired, 63.59% done, 0 days 02:20:17 to go
> > > > config:
> > > > NAME STATE READ WRITE CKSUM
> > > > jaguar DEGRADED 0 0 0
> > > > raidz1-0 DEGRADED 0 0 0
> > > > c6t0d0 ONLINE 0 0 0
> > > > c6t2d0 ONLINE 0 0 0
> > > > c6t3d0 FAULTED 0 0 0 too many
> > > > errors (repairing)
> > > >
> > >
> > > Can you postoutput from:
> > > iostat -En
> > > fmadm faulty
> > >
> > > in any case, there definitely is bug about error reporting -
> > > counters
> > > are zero while “too many errors” is reported.
> > >
> > > rgds,
> > > toomas
> > > _______________________________________________
> > > openindiana-discuss mailing list
> > > openindiana-discuss at openindiana.org
> > > https://openindiana.org/mailman/listinfo/openindiana-discuss
> >
> > _______________________________________________
> > openindiana-discuss mailing list
> > openindiana-discuss at openindiana.org
> > https://openindiana.org/mailman/listinfo/openindiana-discuss
>
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> https://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the openindiana-discuss
mailing list