Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 25 Nov 2018 00:38:33 -0800
From:      Kirk McKusick <mckusick@mckusick.com>
To:        soralx@cydem.org
Cc:        "Julian H. Stacey" <jhs@berklix.com>, freebsd-fs@freebsd.org
Subject:   Re: [bug] fsck refuses to repair damaged UFS using backup superblock
Message-ID:  <201811250838.wAP8cXoj046038@chez.mckusick.com>
In-Reply-To: <201811230117.wAN1HKAT037185@fire.js.berklix.net>

next in thread | previous in thread | raw e-mail | index | archive | help
> To: soralx@cydem.org
> Subject: Re: [bug] fsck refuses to repair damaged UFS using backup super=
block
> From: "Julian H. Stacey" <jhs@berklix.com>
> Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germa=
ny
> Date: Fri, 23 Nov 2018 02:17:20 +0100
> =

> Hi soralx@cydem.org,
> Added cc: <freebsd-fs@freebsd.org> to ensure file system specialists see=
 this.
> =

> Reference:
>> From:		<soralx@cydem.org>
>> Date:		Tue, 20 Nov 2018 05:30:00 -0800
> =

> soralx@cydem.org wrote:
>> =

>> Howdy!
>> =

>>  Since send-pr(1) is now gone, I guess the next option is to send a
>>  message directly to the developers...
>> =

>>  Yesterday, I ran into a bug in fsck_ffs that gave me a little scare.
>> =

>>  Short story: on -CURRENT, fsck refuses to check a FS with a corrupted
>>  superblock, even when an alternate (backup) SB location is given.
>> =

>>  Long story. I've been testing a newly-built system based on an X399
>>  platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The
>>  system ran a ~2-day old -CURRENT; when compiling newest world and
>>  kernel, I found the machine in a locked-up state. After a hard reset,
>>  boot failed because the root FS became corrupted & was not available:
>>    kernel: Superblock check-hash failed: recorded check-hash XXX !=3D c=
omputed check-hash YYY
>> =

>>  I have not yet figured out why the corruption happened... bad hardware=
?
>>  bug in the NVMe driver?
>> =

>>  "OK", I thought, "No worries. We'll just boot using another disk, fsck
>>  the corrupted FS with a backup superblock, and be up in a moment".
>>  The machine was doing nothing but compiling, so no valuable data loss.
>> =

>>  So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the
>>  source for the new disk image, thus had almost identical partitions
>>  and filesystems) to get the FS details, then did `newfs -N [...]
>>  /dev/ada0p3` to find locations of superblock backups, then finally
>>  ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check-
>>  -hash failed" message, plus another strange message: "Can't open
>>  /dev/nvd0p3: [...]". Then fsck quits.
>>  Note that `fsck_ffs -b ...` on a FS with good superblock works OK.
>> =

>>  After fiddling with a debugger for a bit, I commented out the line
>>  "return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck,
>>  and the FS was recovered successfully.
>> =

>>  What was actually happening: fsck's setup.c calls ufs_disk_fillout()
>>  from libufs' type.c, which in turn calls sbread() from the same
>>  library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1'
>>  is hard-coded to indicate the primary superblock]] that then simply
>>  invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT,
>>  which eventually causes fsck to give up before even looking at the
>>  specified backup superblock.
>> =

>>  I don't know what exactly ufs_disk_fillout() does, but fortunately
>>  for me, fsck worked without the "sbread(disk)" part of that function
>>  having much luck on a disk with corrupted superblock. Also, I have a
>>  feeling that calling a kernel's ffs driver function when using fsck
>>  to fix a broken filesystem is not the best thing to do...
>> =

>>  Please CC, as I am not subscribed.
>> =

>> -- =

>> [SorAlx]  ridin' VN2000 Classic LT
> =

> Cheers,
> Julian

Below is a proposed fix for fsck_ffs to properly handle superblock
check-hash failures (notably to optionally search for a usable
alternate superblock). Let me know if you still have a filesystem
on which you can test it, and if so whether it works correctly.

	Kirk McKusick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201811250838.wAP8cXoj046038>