From owner-freebsd-fs@freebsd.org Sun Nov 25 15:25:25 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 60DE011369E2 for ; Sun, 25 Nov 2018 15:25:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660064.outbound.protection.outlook.com [40.107.66.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8D9806A3E1 for ; Sun, 25 Nov 2018 15:25:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM (52.132.50.155) by YTOPR0101MB2172.CANPRD01.PROD.OUTLOOK.COM (52.132.46.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1294.26; Sun, 25 Nov 2018 15:25:21 +0000 Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b]) by YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b%5]) with mapi id 15.20.1294.048; Sun, 25 Nov 2018 15:25:21 +0000 From: Rick Macklem To: "soralx@cydem.org" , Kirk McKusick CC: "freebsd-fs@freebsd.org" , "Julian H. Stacey" Subject: Re: [bug] fsck refuses to repair damaged UFS using backup superblock Thread-Topic: [bug] fsck refuses to repair damaged UFS using backup superblock Thread-Index: AQHUhJlTgUa5G9TQ8USwP5697EmSSKVgmAwV Date: Sun, 25 Nov 2018 15:25:21 +0000 Message-ID: References: <201811230117.wAN1HKAT037185@fire.js.berklix.net>, <201811250838.wAP8cXoj046038@chez.mckusick.com> In-Reply-To: <201811250838.wAP8cXoj046038@chez.mckusick.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB2172; 6:39aTxq2pLdWRjxgf7bfeRRYZxGQ4s3WGaXAO1W1B3sCFDvBruCh8W8qBbWtRKcxSUaZ9k8zFEbsMhyvF90p2w80sIgEaHkgjv2XVEKkdZxAJHx7s+iaFoxNpc2i6C/31kYhYcCx7znNHqsgedYdR07yfUM7cxNaav2jmbu8KAcir5MEuv0rMGaVn4/ZKs80/OJEjSlFsF8rpzbuHGyCS585WfjLitPwzlusV1YlfwQYonF1wEh5lbKzxwzDTY7ibVFkZyyQWRVGlHicuzWGBAvq6QkflYI0FRpzoqhtrH6V9Gyiv9r0DJrNAZr3ZAqmm4O14MGmQoyLv5D1j7hOEKulpGo1WQT3LNJNb5uhsASq5bCKy4hBcBQD3aA5bRzPL5ZmegVeurSrmy5SH8B4P3r0zF73hLtWRqtt5MPqyyZUuWbDVFBxaEFhhzAPZfKyTyEVXDZlta2tkaP5CtzXDUg==; 5:2qcFXDXFkqbyDDGHEoP9/Zirm+ZanLLW/c/sHDu7qHBJ7ViBTeBqDarNYU33dRxrcsChmMXZ/aQ0YLJL5y5VKumFJhSotMEzbqunDo1SF2J2zZDvA9uWr+ULutN3MZYzrKPDNOd31vAUkgZPd9j8ImNdFu7lVNQ4KpQ6VltJKSg=; 7:haa9kBuilZWVAu4Yzsq3tj8gegsvY/NKkPaXRl2m24FuhEFAwCXFxcfbrVMA9vqttrBfXS3EruiHQI3FUp+0JcdcKF4x6Jr4Ik8DBqlojAcXzx5dXaiOTZje3Yz9uIkWpGe68ZiVRsQW35bhvxV6JA== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 12e503d1-08a6-4954-02cd-08d652ea369c x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390098)(7020095)(4652040)(8989299)(5600074)(711020)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB2172; x-ms-traffictypediagnostic: YTOPR0101MB2172: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(10201501046)(3231442)(944501410)(4982022)(52105112)(3002001)(93006095)(93001095)(148016)(149066)(150057)(6041310)(20161123564045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123560045)(20161123562045)(201708071742011)(7699051)(76991095); SRVR:YTOPR0101MB2172; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB2172; x-forefront-prvs: 0867F4F1AA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(39850400004)(396003)(366004)(136003)(199004)(189003)(53386004)(81156014)(6436002)(81166006)(8936002)(9686003)(2501003)(55016002)(6306002)(74316002)(305945005)(71190400001)(71200400001)(8676002)(99286004)(7696005)(486006)(76176011)(6246003)(14444005)(316002)(33656002)(68736007)(54906003)(110136005)(97736004)(966005)(4326008)(105586002)(478600001)(14454004)(256004)(106356001)(74482002)(2906002)(476003)(186003)(5660300001)(6506007)(446003)(25786009)(786003)(53936002)(575784001)(46003)(229853002)(86362001)(102836004)(11346002)(21314003); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB2172; H:YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: bSIiKSkvR4lwdErj5ZKTOaX9GAwB0aEVcfIo2UXV2rsnexrGZGCoC7nHt7F18XpWw8waipG/sDSw9HzRtXeLEuAco92H7+L2y85K/kMVz9lOMNym8oP8mhhoce+V4sIyvb+lMssV17p3q6jre3DM57Q1ktyf5kklvH+F/wRcjGqEvsX1Ebbl+7Nbbmb1yLBuR+ErDe8Kh3Q7Tfwnk9JCa++Yo9zqMTnCwA7LOVM7Js/j+n0G46K5eTnHZYef3I/npk3rCdQ4Qs3P38VlsHh+gguhwGERIroGOHKjJVjuLo4PNJTS4AuWsHWhDVn6p/Ii4LJcY9viTcy7XeyMkmSqQp1xt5K8JM1szO2tC1nrNLo= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 12e503d1-08a6-4954-02cd-08d652ea369c X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Nov 2018 15:25:21.0722 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB2172 X-Rspamd-Queue-Id: 8D9806A3E1 X-Spamd-Result: default: False [-2.83 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.97)[-0.967,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/17]; NEURAL_HAM_LONG(-0.96)[-0.956,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com]; NEURAL_HAM_SHORT(-0.59)[-0.594,0]; RCVD_IN_DNSWL_NONE(0.00)[64.66.107.40.list.dnswl.org : 127.0.3.0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2018 15:25:25 -0000 Kirk McKusick wrote: >> To: soralx@cydem.org >> Subject: Re: [bug] fsck refuses to repair damaged UFS using backup super= block >> From: "Julian H. Stacey" >> Organization: http://berklix.eu BSD Unix Linux Consultants, Munich Germa= ny >> Date: Fri, 23 Nov 2018 02:17:20 +0100 >> >> Hi soralx@cydem.org, >> Added cc: to ensure file system specialists see= this. >> >> Reference: >>> From: >>> Date: Tue, 20 Nov 2018 05:30:00 -0800 >> >> soralx@cydem.org wrote: >>> >>> Howdy! >>> >>> Since send-pr(1) is now gone, I guess the next option is to send a >>> message directly to the developers... >>> >>> Yesterday, I ran into a bug in fsck_ffs that gave me a little scare. >>> >>> Short story: on -CURRENT, fsck refuses to check a FS with a corrupted >>> superblock, even when an alternate (backup) SB location is given. >>> >>> Long story. I've been testing a newly-built system based on an X399 >>> platform with a 2950X CPU and an Optane 905P 480GB U.2 drive. The >>> system ran a ~2-day old -CURRENT; when compiling newest world and >>> kernel, I found the machine in a locked-up state. After a hard reset, >>> boot failed because the root FS became corrupted & was not available: >>> kernel: Superblock check-hash failed: recorded check-hash XXX !=3D c= omputed >check-hash YYY >>> >>> I have not yet figured out why the corruption happened... bad hardware= ? >>> bug in the NVMe driver? >>> All I did was boot a pre-r339671 kernel that used the file systems and then= , bingo... >>> "OK", I thought, "No worries. We'll just boot using another disk, fsck >>> the corrupted FS with a backup superblock, and be up in a moment". >>> The machine was doing nothing but compiling, so no valuable data loss. >>> >>> So I did `dumpfs -m /dev/ada0p3` on the spare disk (which was the >>> source for the new disk image, thus had almost identical partitions >>> and filesystems) to get the FS details, then did `newfs -N [...] >>> /dev/ada0p3` to find locations of superblock backups, then finally >>> ran `fsck_ffs -b 192 /dev/nvd0p3` -- only to get the same "check- >>> -hash failed" message, plus another strange message: "Can't open >>> /dev/nvd0p3: [...]". Then fsck quits. >>> Note that `fsck_ffs -b ...` on a FS with good superblock works OK. >>> >>> After fiddling with a debugger for a bit, I commented out the line >>> "return (0);" in /usr/src/sbin/fsck_ffs/setup.c:136, recompiled fsck, >>> and the FS was recovered successfully. >>> >>> What was actually happening: fsck's setup.c calls ufs_disk_fillout() >>> from libufs' type.c, which in turn calls sbread() from the same >>> library, which then calls sbget(disk->d_fd, &fs, -1) [[where '-1' >>> is hard-coded to indicate the primary superblock]] that then simply >>> invokes ffs_sbget from ffs kernel driver -- and this returns ENOENT, >>> which eventually causes fsck to give up before even looking at the >>> specified backup superblock. >>> >>> I don't know what exactly ufs_disk_fillout() does, but fortunately >>> for me, fsck worked without the "sbread(disk)" part of that function >>> having much luck on a disk with corrupted superblock. Also, I have a >>> feeling that calling a kernel's ffs driver function when using fsck >>> to fix a broken filesystem is not the best thing to do... >>> >>> Please CC, as I am not subscribed. >>> >>> -- >>> [SorAlx] ridin' VN2000 Classic LT >> >> Cheers, >> Julian > >Below is a proposed fix for fsck_ffs to properly handle superblock >check-hash failures (notably to optionally search for a usable >alternate superblock). Let me know if you still have a filesystem >on which you can test it, and if so whether it works correctly. As above, I think you can reproduce this by running an older kernel that mounts the file system. I ended up re-installing when I ran into this yeste= rday (no biggy, it was just a test machine). It happened after I had been runnin= g a kernel built from stable/12 on the system and then tried to boot it. (Since the root fs got these errors, I couldn't boot any kernel on the root= fs.) It would be nice if there was a way to override the check and boot the syst= em. (Is a loader tunable reasonable for this?) rick