From owner-freebsd-fs@freebsd.org Sun Nov 25 08:31:17 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B5AC114BA91 for ; Sun, 25 Nov 2018 08:31:17 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5C51A7AA9B for ; Sun, 25 Nov 2018 08:31:16 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.15.2) with ESMTP id wAP8cXoj046038; Sun, 25 Nov 2018 00:38:33 -0800 (PST) (envelope-from mckusick@mckusick.com) Message-Id: <201811250838.wAP8cXoj046038@chez.mckusick.com> From: Kirk McKusick To: soralx@cydem.org Subject: Re: [bug] fsck refuses to repair damaged UFS using backup superblock cc: "Julian H. Stacey" , freebsd-fs@freebsd.org X-URL: http://WWW.McKusick.COM/ Reply-To: Kirk McKusick In-reply-to: <201811230117.wAN1HKAT037185@fire.js.berklix.net> Comments: In-reply-to "Julian H. Stacey" message dated "Fri, 23 Nov 2018 02:17:20 +0100." MIME-Version: 1.0 Content-ID: <46001.1543135061.0@chez.mckusick.com> Date: Sun, 25 Nov 2018 00:38:33 -0800 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,MISSING_MID, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on chez.mckusick.com X-Rspamd-Queue-Id: 5C51A7AA9B X-Spamd-Result: default: False [1.60 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[mckusick@mckusick.com]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-0.01)[-0.008,0]; MIME_GOOD(-0.10)[multipart/mixed,text/plain]; DMARC_NA(0.00)[mckusick.com]; AUTH_NA(1.00)[]; NEURAL_SPAM_SHORT(0.78)[0.780,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[chez.mckusick.com]; RCVD_IN_DNSWL_NONE(0.00)[235.157.36.70.list.dnswl.org : 127.0.10.0]; NEURAL_HAM_MEDIUM(-0.14)[-0.143,0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:46375, ipnet:70.36.128.0/19, country:US]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE(-0.02)[country: US(-0.09)]; MIME_UNKNOWN(0.10)[application/text] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2018 08:31:17 -0000 > To: soralx@cydem.org > Subject: Re: [bug] fsck refuses to repair damaged UFS using backup super= block > From: "Julian H. Stacey" > Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germa= ny > Date: Fri, 23 Nov 2018 02:17:20 +0100 > = > Hi soralx@cydem.org, > Added cc: to ensure file system specialists see= this. > = > Reference: >> From: >> Date: Tue, 20 Nov 2018 05:30:00 -0800 > = > soralx@cydem.org wrote: >> = >> Howdy! >> = >> Since send-pr(1) is now gone, I guess the next option is to send a >> message directly to the developers... >> = >> Yesterday, I ran into a bug in fsck_ffs that gave me a little scare. >> = >> Short story: on -CURRENT, fsck refuses to check a FS with a corrupted >> superblock, even when an alternate (backup) SB location is given. >> = >> Long story. I've been testing a newly-built system based on an X399 >> platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The >> system ran a ~2-day old -CURRENT; when compiling newest world and >> kernel, I found the machine in a locked-up state. After a hard reset, >> boot failed because the root FS became corrupted & was not available: >> kernel: Superblock check-hash failed: recorded check-hash XXX !=3D c= omputed check-hash YYY >> = >> I have not yet figured out why the corruption happened... bad hardware= ? >> bug in the NVMe driver? >> = >> "OK", I thought, "No worries. We'll just boot using another disk, fsck >> the corrupted FS with a backup superblock, and be up in a moment". >> The machine was doing nothing but compiling, so no valuable data loss. >> = >> So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the >> source for the new disk image, thus had almost identical partitions >> and filesystems) to get the FS details, then did `newfs -N [...] >> /dev/ada0p3` to find locations of superblock backups, then finally >> ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check- >> -hash failed" message, plus another strange message: "Can't open >> /dev/nvd0p3: [...]". Then fsck quits. >> Note that `fsck_ffs -b ...` on a FS with good superblock works OK. >> = >> After fiddling with a debugger for a bit, I commented out the line >> "return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck, >> and the FS was recovered successfully. >> = >> What was actually happening: fsck's setup.c calls ufs_disk_fillout() >> from libufs' type.c, which in turn calls sbread() from the same >> library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1' >> is hard-coded to indicate the primary superblock]] that then simply >> invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT, >> which eventually causes fsck to give up before even looking at the >> specified backup superblock. >> = >> I don't know what exactly ufs_disk_fillout() does, but fortunately >> for me, fsck worked without the "sbread(disk)" part of that function >> having much luck on a disk with corrupted superblock. Also, I have a >> feeling that calling a kernel's ffs driver function when using fsck >> to fix a broken filesystem is not the best thing to do... >> = >> Please CC, as I am not subscribed. >> = >> -- = >> [SorAlx] ridin' VN2000 Classic LT > = > Cheers, > Julian Below is a proposed fix for fsck_ffs to properly handle superblock check-hash failures (notably to optionally search for a usable alternate superblock). Let me know if you still have a filesystem on which you can test it, and if so whether it works correctly. Kirk McKusick