From owner-freebsd-stable@freebsd.org Tue Feb 23 10:53:02 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7B6DD566A3E for ; Tue, 23 Feb 2021 10:53:02 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from relay10.mail.gandi.net (relay10.mail.gandi.net [217.70.178.230]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DlG9V1P0Fz4n3l for ; Tue, 23 Feb 2021 10:53:01 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from [192.168.0.88] (unknown [195.64.148.76]) (Authenticated sender: andriy.gapon@uabsd.com) by relay10.mail.gandi.net (Postfix) with ESMTPSA id 9CEC124000A; Tue, 23 Feb 2021 10:52:59 +0000 (UTC) Subject: Re: lots of "no such file or directory" errors in zfs filesystem To: Chris Anderson Cc: freebsd-stable@freebsd.org References: <48b78acb-7667-7829-8dd0-e753b7ac3336@FreeBSD.org> From: Andriy Gapon Message-ID: Date: Tue, 23 Feb 2021 12:52:58 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4DlG9V1P0Fz4n3l X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2021 10:53:02 -0000 On 23/02/2021 05:25, Chris Anderson wrote: > so I can't ls -i the file since that triggers the no such file warning. if I run > zdb -dddd on the inode of a directory which contains one of those missing files, > I can get the inode of the file from that, but I don't get anything particularly > interesting in the output. > > most of the files that are missing are in directories with a large number of > files (the largest has 180k) but I managed to find a directory which had a > single file entry that is missing: > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, rootbp > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > fletcher4 uncompressed LE contiguous unique double size=800L/800P > birth=46916371L/46916371P fill=908537 > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > >     Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type > >      38268    1   128K     1K      0    512     1K  100.00  ZFS directory > >                                                264   bonus  ZFS znode > >         dnode flags: USED_BYTES USERUSED_ACCOUNTED  > >         dnode maxblkid: 0 > >         uid     1001 > >         gid     1001 > >         atime   Sun Aug  6 02:00:41 2017 > >         mtime   Wed Apr 15 12:12:42 2020 > >         ctime   Wed Apr 15 12:12:42 2020 > >         crtime  Sat Aug  5 15:10:07 2017 > >         gen     23881023 > >         mode    40755 > >         size    3 > >         parent  38176 > >         links   2 > >         pflags  40800000144 > >         xattr   0 > >         rdev    0x0000000000000000 > >         microzap: 1024 bytes, 1 entries > >          > >                 hash_test.go = 38274 (type: Regular File) > > > # zdb -dddd tank/home/cva 38274 > > Dataset tank/home/cva [ZPL], ID 196, cr_txg 163, 109G, 908537 objects, rootbp > DVA[0]=<0:13210311000:1000> DVA[1]=<0:18b9a02c000:1000> [L0 DMU objset] > fletcher4 uncompressed LE contiguous unique double size=800L/800P > birth=46916371L/46916371P fill=908537 > cksum=11fdd21d1d:13cb24c87a6e:da0c9bf1b5df3:715ab2ec45b7b09 > > >     Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type > > zdb: dmu_bonus_hold(38274) failed, errno 2 So, this looks like a "simple" problem. Unfortunately, it is very hard to tell retrospectively what bug caused it. The directory has an entry for the file, but the file does not actually exist (or has a different ID). This is a logical inconsistency, not a data integrity issue. So, a scrub, being a data integrity check, would not detect such an issue. Hypothetical zfs_fsck is needed to find and repair such logical problems. Does that pool and filesystem have any special history? I mean upgrades, replication via send/recv, moving between OS-s, etc. -- Andriy Gapon