From nobody Fri Sep 1 03:38:01 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcNz070HKz4rywk; Fri, 1 Sep 2023 03:38:04 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta001.cacentral1.a.cloudfilter.net (omta001.cacentral1.a.cloudfilter.net [3.97.99.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcNz05v7tz4PZK; Fri, 1 Sep 2023 03:38:04 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; none Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227]) by cmsmtp with ESMTP id be2fqL6WGLAoIbuyyqMavh; Fri, 01 Sep 2023 03:38:04 +0000 Received: from spqr.komquats.com ([70.66.152.170]) by cmsmtp with ESMTPA id buywqgTGk3fOSbuyxqXOql; Fri, 01 Sep 2023 03:38:04 +0000 X-Authority-Analysis: v=2.4 cv=J8G5USrS c=1 sm=1 tr=0 ts=64f15c9c a=y8EK/9tc/U6QY+pUhnbtgQ==:117 a=y8EK/9tc/U6QY+pUhnbtgQ==:17 a=xqWC_Br6kY4A:10 a=IkcTkHD0fZMA:10 a=UttIx32zK-AA:10 a=6I5d2MoRAAAA:8 a=YxBL1-UpAAAA:8 a=EkcXrb_YAAAA:8 a=DFkHzWEJDzc0sj43XusA:9 a=QEXdDO2ut3YA:10 a=IjZwj45LgO3ly-622nXo:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 23D0D56E; Thu, 31 Aug 2023 20:38:02 -0700 (PDT) Received: from slippy (localhost [IPv6:::1]) by slippy.cwsent.com (Postfix) with ESMTP id 0E377D5; Thu, 31 Aug 2023 20:38:02 -0700 (PDT) Date: Thu, 31 Aug 2023 20:38:01 -0700 From: Cy Schubert To: Kyle Evans Cc: Alexander Motin , Gleb Smirnoff , Martin Matuska , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad Message-ID: <20230831203801.24fade6d@slippy> In-Reply-To: References: <202308270509.37R596B5048298@gitrepo.freebsd.org> Organization: KOMQUATS X-Mailer: Claws Mail 3.19.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfLZ2rjQDrhCzC7pGD26WZdIqcf7t8k4uAFXmyOPU9C90LgHHHbJJpK+bgCbYddzxo45fwJGqp7I8LbCcOxo4Pa/KUJ0oZz0lsHQJjmOWMWl3jq7UC7nm /6XaE86Y+yO43NkEI4mFWagv6Dfn71qKtJ8ZkIs75LR+d8M+7KSPQ/Z9T9+KPHtYA/e3Et57r9015RGKsi6BSRVb0jOTh4LapGayqRcvBCScYi+n6IT0yAQC xvBuY67f+HcoccduskAvqeqVCOfvwqKYZQuhobBQ7HlXtLz5JwMibsiBfVEh5yF4Wuo5ANfsvh56xmDIwKlLLSPE5yHjDZA2OOOXQ1zl7VVaOlCsGKixKV6n AJNTu6v46+RiV/rXQPj/e5deyPbYsCZNhckR46n1am0BCFykL7CKQ630hGvj35Mpk+l8QBgH X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US] X-Rspamd-Queue-Id: 4RcNz05v7tz4PZK On Thu, 31 Aug 2023 21:18:30 -0500 Kyle Evans wrote: > On 8/30/23 20:01, Alexander Motin wrote: > > On 30.08.2023 20:09, Gleb Smirnoff wrote: =20 > >> =C2=A0=C2=A0 Hi Martin, > >> > >> we hit this panic with this update: > >> > >> panic: VERIFY(avl_is_empty(&lwb->lwb_vdev_tree)) failed > >> > >> cpuid =3D 8 > >> time =3D 1693432004 > >> KDB: stack backtrace: > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame=20 > >> 0xfffffe09519aa800 > >> vpanic() at vpanic+0x132/frame 0xfffffe09519aa930 > >> spl_panic() at spl_panic+0x3a/frame 0xfffffe09519aa990 > >> zil_free_lwb() at zil_free_lwb+0x1de/frame 0xfffffe09519aa9c0 > >> zil_sync() at zil_sync+0x3cb/frame 0xfffffe09519aaaa0 > >> [Wed Aug 30 21:46:44 2023]dmu_objset_sync() at=20 > >> dmu_objset_sync+0x51b/frame 0xfffffe09519aab70 > >> dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfffffe09519aabf0 > >> spa_sync() at spa_sync+0xc68/frame 0xfffffe09519aae20 > >> txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfffffe09519aaef0 > >> fork_exit() at fork_exit+0x82/frame 0xfffffe09519aaf30 > >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe09519aaf30 > >> --- trap 0x26d90a2f, rip =3D 0xbba54c81b25fd8d4, rsp =3D=20 > >> 0xff60c2f6f9874bcd, rbp =3D 0x37940e3e9793bd40 -- > >> > >> I have two cores dumped for this panic. > >> > >> Our previous version is based on=20 > >> 9c0a2d522f3a18fc795a7dd6266d2be2aff4b084, which > >> I guess has the most recent ZFS merge openzfs/zfs@009d3288d. > >> =20 > > Hi Gleb, > > > > It is the first time I see a panic like this. I'll think about it > > tomorrow. But I'd appreciate any information on what is your workload > > and what are you doing related to ZIL (O_SYNC, fsync(), sync=3Dalways, > > etc) to trigger it? What is your pool configuration? > > =20 >=20 > Hello! >=20 > Probably not related, but I seem to have hit a use-after-free in ZIL=20 > bits; is this something you're interested in? Backtrace is: >=20 > #17 zio_nowait (zio=3D0xdeadc0dedeadc0de) at=20 > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zio.c:2367 > #18 0xffff00000160720c in zil_lwb_write_issue=20 > (zilog=3Dzilog@entry=3D0xffffa00021631800, lwb=3D0xffffa00933d34c40) > at /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:1935 > --Type for more, q to quit, c to continue without paging-- > #19 0xffff0000016031c0 in zil_commit_writer (zilog=3D0xffffa00021631800,= =20 > zcw=3D0xffffa000fb01e140) at=20 > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:2985 > #20 zil_commit_impl (zilog=3D0xffffa00021631800, foid=3D) = at=20 > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:3489 > #21 0xffff000001602f44 in zil_commit (zilog=3D,=20 > foid=3D) at=20 > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:3452 > #22 0xffff0000015fdac8 in zfs_fsync (zp=3D0xffffa007e27b23a0,=20 > syncflag=3D, cr=3D) at=20 > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zfs_vnops.c:75 > #23 0xffff00000056a864 in VOP_FSYNC (vp=3D0xffffa001d2003000, waitfor=3D1= ,=20 > td=3D0xffff00015aa94620) at ./vnode_if.h:763 > #24 kern_fsync (td=3D0xffff00015aa94620, fd=3D,=20 > fullsync=3Dtrue) at /usr/src.kbsd/sys/kern/vfs_syscalls.c:3556 > #25 0xffff000000800630 in syscallenter (td=3D0xffff00015aa94620) at=20 > /usr/src.kbsd/sys/arm64/arm64/../../kern/subr_syscall.c:187 > #26 svc_handler (td=3D0xffff00015aa94620, frame=3D) at=20 > /usr/src.kbsd/sys/arm64/arm64/trap.c:198 > #27 do_el0_sync (td=3D0xffff00015aa94620, frame=3D) at=20 > /usr/src.kbsd/sys/arm64/arm64/trap.c:625 >=20 >=20 > If I pop up to the zil_lwb_write_issue frame: >=20 > (kgdb) print *lwb > $4 =3D {lwb_zilog =3D 0xdeadc0dedeadc0de, lwb_blk =3D {blk_dva =3D {{dva_= word =3D=20 > {16045693110842147038, 16045693110842147038}}, {dva_word =3D=20 > {16045693110842147038, 16045693110842147038}}, {dva_word =3D=20 > {16045693110842147038, > 16045693110842147038}}}, blk_prop =3D 16045693110842147038,=20 > blk_pad =3D {16045693110842147038, 16045693110842147038}, blk_phys_birth = =3D=20 > 16045693110842147038, blk_birth =3D 16045693110842147038, > blk_fill =3D 16045693110842147038, blk_cksum =3D {zc_word =3D=20 > {16045693110842147038, 16045693110842147038, 16045693110842147038,=20 > 16045693110842147038}}}, lwb_slim =3D 3735929054, lwb_slog =3D 3735929054= ,=20 > lwb_error =3D -559038242, > lwb_nmax =3D -559038242, lwb_nused =3D -559038242, lwb_nfilled =3D=20 > -559038242, lwb_sz =3D -559038242, lwb_state =3D 3735929054, lwb_buf =3D= =20 > 0xdeadc0dedeadc0de 0xdeadc0dedeadc0de>, =20 > lwb_child_zio =3D 0xdeadc0dedeadc0de, lwb_write_zio =3D=20 > 0xdeadc0dedeadc0de, lwb_root_zio =3D 0xdeadc0dedeadc0de,=20 > lwb_issued_timestamp =3D -2401050962867404578, lwb_issued_txg =3D=20 > 16045693110842147038, > lwb_alloc_txg =3D 16045693110842147038, lwb_max_txg =3D=20 > 16045693110842147038, lwb_node =3D {list_next =3D 0xdeadc0dedeadc0de,=20 > list_prev =3D 0xdeadc0dedeadc0de}, lwb_issue_node =3D {list_next =3D=20 > 0xdeadc0dedeadc0de, > list_prev =3D 0xdeadc0dedeadc0de}, lwb_itxs =3D {list_size =3D=20 > 16045693110842147038, list_offset =3D 16045693110842147038, list_head =3D= =20 > {list_next =3D 0xdeadc0dedeadc0de, list_prev =3D 0xdeadc0dedeadc0de}},=20 > lwb_waiters =3D { > list_size =3D 16045693110842147038, list_offset =3D=20 > 16045693110842147038, list_head =3D {list_next =3D 0xdeadc0dedeadc0de,=20 > list_prev =3D 0xdeadc0dedeadc0de}}, lwb_vdev_tree =3D {avl_root =3D=20 > 0xdeadc0dedeadc0de, > avl_compar =3D 0xdeadc0dedeadc0de, avl_offset =3D 160456931108421470= 38,=20 > avl_numnodes =3D 16045693110842147038}, lwb_vdev_lock =3D {lock_object = =3D { > lo_name =3D 0xdeadc0dedeadc0de address 0xdeadc0dedeadc0de>, lo_flags =3D 3735929054, lo_data =3D=20 > 3735929054, lo_witness =3D 0xdeadc0dedeadc0de}, sx_lock =3D=20 > 16045693110842147038}} >=20 > It seems to have clearly been stomped on by uma trashing. Encountered=20 > while running a pkgbase build, I think while it was in the packaging=20 > phase. I note in particular in that frame: >=20 > (kgdb) p/x lwb->lwb_issued_timestamp > $4 =3D 0xdeadc0dedeadc0de >=20 > So I guess it was freed sometime during one of the previous two=20 > zio_nowait() calls. >=20 > Thanks, >=20 > Kyle Evans >=20 I think the four panics on my two downstairs machines two days ago, the panic today on the one of the same two machines today, and the email data corruption my laptop has been experiencing, and the backtraces I've seen Gleb and you post give me the uneasy sense that there may be some kind of general memory corruption due to bad or corrupted addresses being dereferenced. --=20 Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=3D0