From nobody Mon Dec 13 19:34:13 2021 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 56E4D18DCD6B for ; Mon, 13 Dec 2021 19:45:48 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JCX6z0RShz3FnX for ; Mon, 13 Dec 2021 19:45:46 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.org [185.220.148.12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 1BDJj4OJ084428 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Mon, 13 Dec 2021 20:45:05 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: X-Authentication-Warning: uucp.dinoex.org: Host uucp.dinoex.org [185.220.148.12] claimed to be uucp.dinoex.sub.de Received: (from uucp@localhost) by uucp.dinoex.sub.de (8.17.1/8.17.1/Submit) with UUCP id 1BDJj41B084427 for freebsd-stable@freebsd.org; Mon, 13 Dec 2021 20:45:04 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.16.1/8.16.1) with ESMTP id 1BDJb3dg034372 for ; Mon, 13 Dec 2021 20:37:03 +0100 (CET) (envelope-from peter@gate.intra.daemon.contact) Received: from gate.intra.daemon.contact (gate-e [192.168.98.2]) by gate.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 1BDJYD9e033833 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 13 Dec 2021 20:34:13 +0100 (CET) (envelope-from peter@gate.intra.daemon.contact) Received: (from peter@localhost) by gate.intra.daemon.contact (8.16.1/8.16.1/Submit) id 1BDJYDb4033832; Mon, 13 Dec 2021 20:34:13 +0100 (CET) (envelope-from peter) Date: Mon, 13 Dec 2021 20:34:13 +0100 From: Peter To: freebsd-stable@freebsd.org Subject: 12.3:[ZFS pointer corruption] kernel crash exporting FreeBSD src repo Message-ID: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 185.220.148.12; Sender-helo: uucp.dinoex.sub.de;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [185.220.148.12]); Mon, 13 Dec 2021 20:45:07 +0100 (CET) X-Rspamd-Queue-Id: 4JCX6z0RShz3FnX X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of pmc@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=pmc@citylink.dinoex.sub.org X-Spamd-Result: default: False [-2.82 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; HAS_XAW(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_NONE(0.00)[]; NEURAL_HAM_SHORT(-0.52)[-0.523]; DMARC_NA(0.00)[sub.org]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; RCVD_TLS_LAST(0.00)[] X-ThisMailContainsUnwantedMimeParts: N Hija, I have a filesystem with the FreeBSD src repo clone. =20 When I do=20 > git checkout-index -a --prefix=3D/new-filesystem/ in 5 of 6 cases I get a kernel crash. The repo otherwise looks sane: # git status On branch local.generic Your branch is ahead of 'origin/releng/12.3' by 51 commits. (use "git push" to publish your local commits) nothing to commit, working tree clean # git branch -v -v * local.generic f99bc919533 [origin/releng/12.3: ahead 51] em/i= gb capabilities fix f=FCr 12.3 local.generic.releng/11.4 95388afb572 [origin/releng/11.4: ahead 12] KERN= CONF, make conf, sendmail mc local.generic.releng/12.2 63c868925fc [origin/releng/12.2: ahead 45] Kern= elconfigs aktualisiert main 01bad87a76d [origin/main: behind 12250] UPDATIN= G: add an entry for commit 875977314881 The filesystem is also responsive and in working order. ZFS scrub does not see anything to complain. No other error messages anywhere. zdb runs through it and finds no error. Everything else on the system (and there is a lot) appears to work just fine. In 2 of 5 cases a kernel dump is written, which is quite exactly the same every time (in another case the name was not "HEAD", but ".arcconfig", another time "sys"). Fatal trap 12: page fault while in kernel mode cpuid =3D 19; apic id =3D 19 fault virtual address =3D 0x410 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff808e9815 stack pointer =3D 0x28:0xfffffe00e667bf50 frame pointer =3D 0x28:0xfffffe00e667bff0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 18330 (git) trap number =3D 12 #8 0xffffffff808e9815 in _sx_xlock_hard () #9 0xffffffff8039e35b in __dbuf_hold_impl () #10 0xffffffff8039e44f in dbuf_hold () #11 0xffffffff803bc152 in dnode_hold_impl () #12 0xffffffff803a597d in dmu_bonus_hold () #13 0xffffffff80468b16 in zfs_zget () #14 0xffffffff80443ccf in zfs_dirent_lookup () #15 0xffffffff80443d9b in zfs_dirlook () #16 0xffffffff8045ddb0 in zfs_lookup () #17 0xffffffff8045e569 in zfs_freebsd_cachedlookup () #18 0xffffffff80d5ac5c in VOP_CACHEDLOOKUP_APV () #19 0xffffffff8099e27c in vfs_cache_lookup () #20 0xffffffff80d5ab2c in VOP_LOOKUP_APV () #21 0xffffffff80835ba5 in null_lookup () #22 0xffffffff80d5ab2c in VOP_LOOKUP_APV () #23 0xffffffff809a7cc1 in lookup () #24 0xffffffff809a716f in namei () #25 0xffffffff809bff32 in kern_statat () #26 0xffffffff809c06cf in sys_fstatat () #27 0xffffffff80cd38b0 in amd64_syscall () #28 0xffffffff80caba8e in fast_syscall_common () ------------------------------------------------------ #8 0xffffffff808e9815 in _sx_xlock_hard (sx=3D0xfffffe00ab1680a0,=20 x=3D, opts=3D) at /usr/src/sys/kern/kern_sx.c:682 #9 0xffffffff8039e35b in __dbuf_hold_impl (dh=3D0xfffff80018449000) at src/sys/sys/sx.h:168 #10 0xffffffff8039e44f in dbuf_hold (dn=3D0xfffff800183918a0, blkid=3D5302,= =20 tag=3D0xffffffff80d7ef0b) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:3016 #11 0xffffffff803bc152 in dnode_hold_impl (os=3D0xfffff8001815e400,=20 object=3D169684, flag=3D1, slots=3D0, dnp=3D0xfffffe00e667c150) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1329 #12 0xffffffff803a597d in dmu_bonus_hold (os=3D,=20 object=3D, tag=3D,=20 dbp=3D0xfffffe00e667c1e8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:345 #13 0xffffffff80468b16 in zfs_zget (zfsvfs=3D0xfffffe00ba7fe000, obj_num=3D= 169684,=20 zpp=3D0xfffffe00e667c298) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:= 1165 #14 0xffffffff80443ccf in zfs_dirent_lookup (dzp=3D0xfffff80704c676c0,=20 name=3D0xfffffe00e667c3c0 "HEAD", zpp=3D, flag=3D2) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:187 #15 0xffffffff80443d9b in zfs_dirlook (dzp=3D0xfffff80704c676c0,=20 name=3D, zpp=3D0xfffffe00e667c338) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:238 #16 0xffffffff8045ddb0 in zfs_lookup (dvp=3D,=20 nm=3D0xfffffe00e667c3c0 "HEAD", vpp=3D,=20 cnp=3D, nameiop=3D0, cr=3D, f= lags=3D0,=20 cached=3D1) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:= 1671 #17 0xffffffff8045e569 in zfs_freebsd_cachedlookup (ap=3D0xfffffe00e667c520) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:= 4973 #18 0xffffffff80d5ac5c in VOP_CACHEDLOOKUP_APV (vop=3D0xffffffff81067150,= =20 a=3D0xfffffe00e667c520) at src/sys/sys/signalvar.h:352 #19 0xffffffff8099e27c in vfs_cache_lookup (ap=3D) at vnode_if.h:80 #20 0xffffffff80d5ab2c in VOP_LOOKUP_APV (vop=3D0xffffffff81067150,=20 a=3D0xfffffe00e667c5a8) at src/sys/sys/signalvar.h:352 #21 0xffffffff80835ba5 in null_lookup (ap=3D0xfffffe00e667c690) at vnode_if= =2Eh:54 #22 0xffffffff80d5ab2c in VOP_LOOKUP_APV (vop=3D0xffffffff810a2cc0,=20 a=3D0xfffffe00e667c690) at src/sys/sys/signalvar.h:352 #23 0xffffffff809a7cc1 in lookup (ndp=3D0xfffffe00e667c8b8) at vnode_if.h:54 #24 0xffffffff809a716f in namei (ndp=3D0xfffffe00e667c8b8) at /usr/src/sys/kern/vfs_lookup.c:499 #25 0xffffffff809bff32 in kern_statat (td=3D0xfffff805b25b1000,=20 flag=3D, fd=3D,=20 path=3D0x8009c7000
,=20 pathseg=3DUIO_USERSPACE, sbp=3D0xfffffe00e667c9d8, hook=3D0) at /usr/src/sys/kern/vfs_syscalls.c:2327 #26 0xffffffff809c06cf in sys_fstatat (td=3D,=20 uap=3D0xfffff805b25b13c0) at /usr/src/sys/kern/vfs_syscalls.c:2304 #27 0xffffffff80cd38b0 in amd64_syscall (td=3D0xfffff805b25b1000, traced=3D= 0) at src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #28 0xffffffff80caba8e in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:582 Trying to understand this, it all looks quite useful up to #10. And there I am loosing track: how do we get from #10 to #9 ? Anyway, I'm bored of all these "optimized out". So I managed to build a kernel with -O0. (It's no longer very easy to do that, one might find a switch that works, but then it won't work for the modules. Thats one reason why I want everything compiled into the kernel, and no loadable modules.) It then does even run - and it runs quite well, only a little bit slower. Now the backtrace is more intellegible: #9 0xffffffff80bf6ec8 in _sx_xlock_hard (sx=3D0xfffffe00ab2500a0, x=3D0, o= pts=3D0) at ../../../kern/kern_sx.c:683 #10 0xffffffff803e4595 in __sx_xlock (sx=3D0xfffffe00ab2500a0,=20 td=3D0xfffff80106b3c000, opts=3D0, file=3D0x0, line=3D0) at sx.h:168 #11 0xffffffff803e8947 in dbuf_rele (db=3D0xfffffe00ab250030, tag=3D0x0) at ../../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:3143 #12 0xffffffff803eaefa in __dbuf_hold_impl (dh=3D0xfffff80499edf000) at ../../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:2987 #13 0xffffffff803e9fce in dbuf_hold_impl (dn=3D0xfffff8001fe2bb80, level=3D= 0 '\0',=20 blkid=3D5302, fail_sparse=3D0, fail_uncached=3D0, tag=3D0xffffffff81369= fca,=20 dbp=3D0xfffffe00e533eb18) at ../../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:3016 It then took me a night trying to understand that code (uintptr_t, never seen before - back when I did learn C, the stance was that you shouldn't mix integers and pointers - let alone in the same variable). Then I figured that the x=3D0 in #9 is probably not a good thing. So I looked where that comes from. That is difficult, but it seems to come from the db variable in #11, which is a "dmu_buf_impl_t *", and that looks like this: $6 =3D {db =3D {db_object =3D 18446735312811784192, db_offset =3D 515396075= 52, db_size =3D 15168, db_data =3D 0x6}, db_objset =3D 0x4000001000,=20 db_dnode_handle =3D 0x0, db_parent =3D 0x0, db_hash_next =3D 0x0, db_link= =3D {avl_child =3D 0xfffffe00ab25a070, avl_pcb =3D 0}, db_blkid =3D 0,=20 db_blkptr =3D 0x0, db_level =3D 0 '\0', db_mtx =3D {lock_object =3D {lo_n= ame =3D 0x0, lo_flags =3D 0, lo_data =3D 0, lo_witness =3D 0x0}, sx_lock = =3D 0},=20 db_state =3D DB_UNCACHED, db_holds =3D {rc_count =3D 0}, db_buf =3D 0x0, = db_changed =3D {cv_description =3D 0x0, cv_waiters =3D 0}, db_data_pending = =3D 0x0,=20 db_last_dirty =3D 0x0, db_cache_link =3D {list_next =3D 0x0, list_prev = =3D 0x0}, db_caching_status =3D DB_DBUF_CACHE, db_user =3D 0x0,=20 db_user_immediate_evict =3D 0 '\0', db_freed_in_flight =3D 0 '\0', db_pen= ding_evict =3D 0 '\0', db_dirtycnt =3D 0 '\0'} We can see that the locking info (db_mtx) is just zeroed out, like almost everything else in there. This is not a dbuf, this is a rogue dbuf! I then spent some time reading about these dbufs - but there doesn't seem to be an easy way to get an inventory and monitor them somehow, so the question where this havoc comes from is not easy to figure out. Anyway, I don't necessarily want to figure it out. I want my pool running. And since this must have something to do with the pool (there are 5 other pools on the machine, and they do work), I dumped the pool with "zfs send" into a file. I replaced the disks. I replaced the disk controller with a different brand. I recreated the pool and loaded the file back in. And the kernel crashes do still appear. So the bottomline is: ZFS doesn't like git. Now how do I get that pool working?=20 cheerio, PMc