Date: Sun, 25 Nov 2018 15:25:21 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "soralx@cydem.org" <soralx@cydem.org>, Kirk McKusick <mckusick@mckusick.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, "Julian H. Stacey" <jhs@berklix.com> Subject: Re: [bug] fsck refuses to repair damaged UFS using backup superblock Message-ID: <YTOPR0101MB1162E62A2BA4D92D215C8983DDD60@YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <201811250838.wAP8cXoj046038@chez.mckusick.com> References: <201811230117.wAN1HKAT037185@fire.js.berklix.net>, <201811250838.wAP8cXoj046038@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Kirk McKusick wrote: >> To: soralx@cydem.org >> Subject: Re: [bug] fsck refuses to repair damaged UFS using backup super= block >> From: "Julian H. Stacey" <jhs@berklix.com> >> Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germa= ny >> Date: Fri, 23 Nov 2018 02:17:20 +0100 >> >> Hi soralx@cydem.org, >> Added cc: <freebsd-fs@freebsd.org> to ensure file system specialists see= this. >> >> Reference: >>> From: <soralx@cydem.org> >>> Date: Tue, 20 Nov 2018 05:30:00 -0800 >> >> soralx@cydem.org wrote: >>> >>> Howdy! >>> >>> Since send-pr(1) is now gone, I guess the next option is to send a >>> message directly to the developers... >>> >>> Yesterday, I ran into a bug in fsck_ffs that gave me a little scare. >>> >>> Short story: on -CURRENT, fsck refuses to check a FS with a corrupted >>> superblock, even when an alternate (backup) SB location is given. >>> >>> Long story. I've been testing a newly-built system based on an X399 >>> platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The >>> system ran a ~2-day old -CURRENT; when compiling newest world and >>> kernel, I found the machine in a locked-up state. After a hard reset, >>> boot failed because the root FS became corrupted & was not available: >>> kernel: Superblock check-hash failed: recorded check-hash XXX !=3D c= omputed >check-hash YYY >>> >>> I have not yet figured out why the corruption happened... bad hardware= ? >>> bug in the NVMe driver? >>> All I did was boot a pre-r339671 kernel that used the file systems and then= , bingo... >>> "OK", I thought, "No worries. We'll just boot using another disk, fsck >>> the corrupted FS with a backup superblock, and be up in a moment". >>> The machine was doing nothing but compiling, so no valuable data loss. >>> >>> So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the >>> source for the new disk image, thus had almost identical partitions >>> and filesystems) to get the FS details, then did `newfs -N [...] >>> /dev/ada0p3` to find locations of superblock backups, then finally >>> ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check- >>> -hash failed" message, plus another strange message: "Can't open >>> /dev/nvd0p3: [...]". Then fsck quits. >>> Note that `fsck_ffs -b ...` on a FS with good superblock works OK. >>> >>> After fiddling with a debugger for a bit, I commented out the line >>> "return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck, >>> and the FS was recovered successfully. >>> >>> What was actually happening: fsck's setup.c calls ufs_disk_fillout() >>> from libufs' type.c, which in turn calls sbread() from the same >>> library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1' >>> is hard-coded to indicate the primary superblock]] that then simply >>> invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT, >>> which eventually causes fsck to give up before even looking at the >>> specified backup superblock. >>> >>> I don't know what exactly ufs_disk_fillout() does, but fortunately >>> for me, fsck worked without the "sbread(disk)" part of that function >>> having much luck on a disk with corrupted superblock. Also, I have a >>> feeling that calling a kernel's ffs driver function when using fsck >>> to fix a broken filesystem is not the best thing to do... >>> >>> Please CC, as I am not subscribed. >>> >>> -- >>> [SorAlx] ridin' VN2000 Classic LT >> >> Cheers, >> Julian > >Below is a proposed fix for fsck_ffs to properly handle superblock >check-hash failures (notably to optionally search for a usable >alternate superblock). Let me know if you still have a filesystem >on which you can test it, and if so whether it works correctly. As above, I think you can reproduce this by running an older kernel that mounts the file system. I ended up re-installing when I ran into this yeste= rday (no biggy, it was just a test machine). It happened after I had been runnin= g a kernel built from stable/12 on the system and then tried to boot it. (Since the root fs got these errors, I couldn't boot any kernel on the root= fs.) It would be nice if there was a way to override the check and boot the syst= em. (Is a loader tunable reasonable for this?) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB1162E62A2BA4D92D215C8983DDD60>