Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Feb 2023 18:56:54 -0800
From:      bob prohaska <fbsd@www.zefox.net>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)
Message-ID:  <20230226025654.GA12702@www.zefox.net>
In-Reply-To: <9CEF4E7A-2F13-454F-A04A-A6C5A80FD4B7@yahoo.com>
References:  <202302192054.31JKsq7w079295@chez.mckusick.com> <3DD8EEC2-6135-42A0-A80C-F195CAAC025E@yahoo.com> <20230219222328.GA55941@www.zefox.net> <2F5B20E9-AFF8-42F6-9E1F-50BBDF4E1B79@yahoo.com> <20230220044544.GB57936@www.zefox.net> <9CEF4E7A-2F13-454F-A04A-A6C5A80FD4B7@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 19, 2023 at 09:50:45PM -0800, Mark Millard wrote:
> On Feb 19, 2023, at 20:45, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > 
> > To a casual glance, it looks like a hardware error.
> > But, the machine seems to work fine until it's running
> > buildworld, and then crashes during a relatively easy
> > part of buildworld. The initial error message is:
> > 
> > bob@pelorus:/usr/src % (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 43 29 d6 40 00 00 40 00 
> > (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
> > (da0:umass-sim0:0:0:0): SCSI status: Check Condition
> > (da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
> > (da0:umass-sim0:0:0:0): Error 5, Unretryable error
> 
> A description of "Media Error" from seagate is:
> 
> Medium Error - Indicates the command terminated with a nonrecovered error condition, probably caused by a flaw in the medium or an error in the recorded data.
> 
> To compare/contrast with other alternatives, see:
> 
> https://www.seagate.com/support/kb/scsi-sense-key-chart-196259en/
> 
> A more extensive list with asc/ascq involved as well is at:
> 
> https://en.wikipedia.org/wiki/Key_Code_Qualifier/
> 
> Allowing more comparison/contrast with other classifications.
> 
> It indicates:
> 
> 3 11 00 Medium Error - unrecovered read error
> 
> (matching the reported text).
> 
> > SCSI errors are not unknown, but they usually succeed on retry.
> > It's not obvious why this is treated as un-retryable. 
> 
> Because that is what the "3 11 00" combination involved
> means. The drive is reporting that. It is not a FreeBSD
> driver choice of handling.
> 
> (I'm not expert at drive internals, so I take it at face
> value.)
> 
> > Are there any simple tests that might help decide what's wrong?
> > It's likely that re-running buildworld will reproduce the crash.
> 
> See the https://en.wikipedia.org/wiki/Key_Code_Qualifier/
> description material for some background information?
> 
> > I've placed the results of smartctl -a at the end of the notes. 
> > The interpretation isn't self evident, hopefully someone else
> > can lend an eye. I'll try smartctl -t after a good night's sleep. 
> 
> man smartctl reports:
> 
>                  UNC:   UNCorrectable Error in Data
> 
> The 3 examples of:
> 
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
> 
> indicate UNC. All 3 list the same LBA value.
> 
> Error 4 occurred at disk power-on lifetime: 11121 hours (463 days + 9 hours)
> Error 3 occurred at disk power-on lifetime: 11098 hours (462 days + 10 hours)
> Error 2 occurred at disk power-on lifetime: 11096 hours (462 days + 8 hours)
> 
> So spread over a little over a day overall, with 2 and 3
> spread over a couple of hours.
> 
> It suggests to me that the drive is no longer usable.
> But I'm no expert.

You were correct. After a few re-installations the
disk failed in an obvious way, reporting 395-odd errors. All the
while, SMART seemed to claim the disk "passed" its self-tests.

I was baffled, since the experiments with dd failed to replicate
the error. Evidently there was more to the failure than met the eye.

Thanks for writing!

bob prohaska




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230226025654.GA12702>