Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Feb 2023 20:45:44 -0800
From:      bob prohaska <fbsd@www.zefox.net>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)
Message-ID:  <20230220044544.GB57936@www.zefox.net>
In-Reply-To: <2F5B20E9-AFF8-42F6-9E1F-50BBDF4E1B79@yahoo.com>
References:  <202302192054.31JKsq7w079295@chez.mckusick.com> <3DD8EEC2-6135-42A0-A80C-F195CAAC025E@yahoo.com> <20230219222328.GA55941@www.zefox.net> <2F5B20E9-AFF8-42F6-9E1F-50BBDF4E1B79@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 19, 2023 at 02:35:15PM -0800, Mark Millard wrote:
> 
> Kirk likely monitors the freebsd-fs list.

I didn't notice there was such a list 8-\
 
> Kirk likely does not monitor the freebsd-arm list.
> None of us thought to switch to freebsd-fs at the
> time. The only part of your context that ended up
> to be arm specific was original buildworld crash.
> You definitely started in an appropriate place
> (freebsd-arm). After the crash, the rest was more
> general relative to platforms and more specific
> relative to file system handling (UFS support).
> 
> I do not see any reason for any of this exchange
> to go to any lists, given the current status.

Alas, the story's not over yet 8-(  

After getting the disk fsck'd and booting once more,
an attempt to buildworld using a fresh /usr/src
and empty /usr/obj crashed again, in I think the
same way. This time some notes have been collected
at
http://www.zefox.net/~fbsd/rpi3/scsi_status_error/readme

To a casual glance, it looks like a hardware error.
But, the machine seems to work fine until it's running
buildworld, and then crashes during a relatively easy
part of buildworld. The initial error message is:

bob@pelorus:/usr/src % (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 43 29 d6 40 00 00 40 00 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:umass-sim0:0:0:0): Error 5, Unretryable error

SCSI errors are not unknown, but they usually succeed on retry.
It's not obvious why this is treated as un-retryable. 

Are there any simple tests that might help decide what's wrong?
It's likely that re-running buildworld will reproduce the crash.

I've placed the results of smartctl -a at the end of the notes. 
The interpretation isn't self evident, hopefully someone else
can lend an eye. I'll try smartctl -t after a good night's sleep. 

Thanks for reading!

bob prohaska

 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230220044544.GB57936>