From owner-freebsd-hackers@freebsd.org Tue Nov 20 13:30:09 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5AB46112D08C for ; Tue, 20 Nov 2018 13:30:09 +0000 (UTC) (envelope-from soralx@cydem.org) Received: from smtp.triumf.ca (smtp.triumf.ca [142.90.100.195]) by mx1.freebsd.org (Postfix) with ESMTP id 367F36D74C for ; Tue, 20 Nov 2018 13:30:07 +0000 (UTC) (envelope-from soralx@cydem.org) Received: from mscad14 (mscad14.triumf.ca [142.90.115.36]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.triumf.ca (Postfix) with ESMTP id 13AA5F80C for ; Tue, 20 Nov 2018 05:30:01 -0800 (PST) Date: Tue, 20 Nov 2018 05:30:00 -0800 From: To: Subject: [bug] fsck refuses to repair damaged UFS using backup superblock Message-ID: <20181120053000.56fbee6b@mscad14> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; amd64-portbld-freebsd9.3) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 367F36D74C X-Spamd-Result: default: False [2.78 / 15.00]; ARC_NA(0.00)[]; R_SPF_FAIL(1.00)[-all]; IP_SCORE(-0.02)[country: CA(-0.10)]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.56)[0.559,0]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; NEURAL_SPAM_MEDIUM(0.13)[0.133,0]; RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[cydem.org]; MX_GOOD(-0.01)[spamtrap.bmcorp.ca,spamtrap.sorokin.ca]; NEURAL_SPAM_LONG(0.61)[0.613,0]; FROM_NO_DN(0.00)[]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MID_RHS_NOT_FQDN(0.50)[]; ASN(0.00)[asn:36391, ipnet:142.90.0.0/16, country:CA]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Nov 2018 13:30:09 -0000 Howdy! Since send-pr(1) is now gone, I guess the next option is to send a message directly to the developers... Yesterday, I ran into a bug in fsck_ffs that gave me a little scare. Short story: on -CURRENT, fsck refuses to check a FS with a corrupted superblock, even when an alternate (backup) SB location is given. Long story. I've been testing a newly-built system based on an X399 platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The system ran a ~2-day old -CURRENT; when compiling newest world and kernel, I found the machine in a locked-up state. After a hard reset, boot failed because the root FS became corrupted & was not available: kernel: Superblock check-hash failed: recorded check-hash XXX != computed check-hash YYY I have not yet figured out why the corruption happened... bad hardware? bug in the NVMe driver? "OK", I thought, "No worries. We'll just boot using another disk, fsck the corrupted FS with a backup superblock, and be up in a moment". The machine was doing nothing but compiling, so no valuable data loss. So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the source for the new disk image, thus had almost identical partitions and filesystems) to get the FS details, then did `newfs -N [...] /dev/ada0p3` to find locations of superblock backups, then finally ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check- -hash failed" message, plus another strange message: "Can't open /dev/nvd0p3: [...]". Then fsck quits. Note that `fsck_ffs -b ...` on a FS with good superblock works OK. After fiddling with a debugger for a bit, I commented out the line "return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck, and the FS was recovered successfully. What was actually happening: fsck's setup.c calls ufs_disk_fillout() from libufs' type.c, which in turn calls sbread() from the same library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1' is hard-coded to indicate the primary superblock]] that then simply invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT, which eventually causes fsck to give up before even looking at the specified backup superblock. I don't know what exactly ufs_disk_fillout() does, but fortunately for me, fsck worked without the "sbread(disk)" part of that function having much luck on a disk with corrupted superblock. Also, I have a feeling that calling a kernel's ffs driver function when using fsck to fix a broken filesystem is not the best thing to do... Please CC, as I am not subscribed. -- [SorAlx] ridin' VN2000 Classic LT