From owner-freebsd-fs@freebsd.org Fri Aug 21 18:38:15 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95F6F9BF4B0 for ; Fri, 21 Aug 2015 18:38:15 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 708CD307; Fri, 21 Aug 2015 18:38:15 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 057521D322; Fri, 21 Aug 2015 11:38:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1440182288; x=1440196688; bh=oTW3ttZTPdJB7vnVwJpBRj9OoOlr6z0YR0t4JIRS+vc=; h=Reply-To:Subject:References:To:Cc:From:Date:In-Reply-To; b=OR8zq+yRcvGz2+B2rOGoKx7JB+9YysrUl3DrQTHS628oaM0ih1szplrSbZX3REsga POrgLxe9e8o93+u/N1njLWaDdwsbhheeX4HMNfocaqN5xmnLYZ/NMQhy0ZomLfw62b 586Gq1bKwYtktmW+0H4GqOyKnUIm6agFdQidW+5Q= Reply-To: d@delphij.net Subject: Re: solaris assert: avl_is_empty(&dn -> dn_dbufs) panic References: <201508211748.t7LHmo96096088@gw.catspoiler.org> To: Don Lewis , freebsd-fs@FreeBSD.org Cc: "Justin T. Gibbs" , George Wilson From: Xin Li X-Enigmail-Draft-Status: N1110 Organization: The FreeBSD Project Message-ID: <55D7700B.3080207@delphij.net> Date: Fri, 21 Aug 2015 11:38:03 -0700 MIME-Version: 1.0 In-Reply-To: <201508211748.t7LHmo96096088@gw.catspoiler.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 18:38:15 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, A quick glance at the changes suggests that Justin's changeset may be related. The reasoning is here: https://reviews.csiden.org/r/131/ Related Illumos ticket: https://www.illumos.org/issues/5056 In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs. This is possible now that objset eviction is asynchronously completed in a different context once dbuf eviction completes. In the case of objset eviction, any dbufs held by children will be evicted via dbuf_rele_and_unlock() once their refcounts go to zero. Even when objset eviction is not active, the ordering of the avl tree guarantees that children will be released before parents, allowing the parent's refcounts to naturally drop to zero before they are inspected in this single loop. =3D=3D=3D=3D So, upon return from dnode_evict_dbufs(), there could be some DB_EVICTING buffers on the AVL pending release and thus breaks the invariant. Should we restore the loop where we yield briefly with the lock released, then reacquire and recheck? Cheers, On 08/21/15 10:48, Don Lewis wrote: > On 21 Aug, Don Lewis wrote: >> On 21 Aug, Don Lewis wrote: >>> I just started getting this panic: >>> >>> solaris assert: avl_is_empty(&dn -> dn_dbufs), file: >>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c,= >>> line 495 >>> >>> System info: >>> FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #25 r= 286923: Wed Aug 19 09:28:53 PDT 2015 dl@zipper.catspoiler.org:/usr/ob= j/usr/src/sys/GENERIC amd64 >>> >>> My zfs pool has one mirrored vdev. Scrub doesn't find any problems. >>> >>> %zpool status >>> pool: zroot >>> state: ONLINE >>> scan: scrub repaired 0 in 2h58m with 0 errors on Fri Aug 21 00:44:5= 2 2015 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> zroot ONLINE 0 0 0 >>> mirror-0 ONLINE 0 0 0 >>> ada0p3 ONLINE 0 0 0 >>> ada1p3 ONLINE 0 0 0 >>> >>> This panic is reproduceable and happens every time I use poudriere to= >>> build ports using my 9.3-RELEASE amd64 jail and occurs at the end of = the >>> poudriere run when it is unmounting filesystems. >>> >>> [00:10:43] =3D=3D=3D=3D>> Stopping 4 builders >>> 93amd64-default-job-01: removed >>> 93amd64-default-job-01-n: removed >>> 93amd64-default-job-02: removed >>> 93amd64-default-job-02-n: removed >>> 93amd64-default-job-03: removed >>> 93amd64-default-job-03-n: removed >>> 93amd64-default-job-04: removed >>> 93amd64-default-job-04-n: removed >>> [00:10:46] =3D=3D=3D=3D>> Creating pkgng repository >>> Creating repository in /tmp/packages: 100% >>> Packing files for repository: 100% >>> [00:10:55] =3D=3D=3D=3D>> Committing packages to repository >>> [00:10:55] =3D=3D=3D=3D>> Removing old packages >>> [00:10:55] =3D=3D=3D=3D>> Built ports: devel/py-pymtbl net/sie-nmsg n= et/p5-Net-Nmsg net/axa >>> [93amd64-default] [2015-08-21_00h47m41s] [committing:] Queued: 4 Bui= lt: 4 Failed: 0 Skipped: 0 Ignored: 0 Tobuild: 0 Time: 00:10:53 >>> [00:10:55] =3D=3D=3D=3D>> Logs: /var/poudriere/data/logs/bulk/93amd64= -default/2015-08-21_00h47m41s >>> [00:10:55] =3D=3D=3D=3D>> Cleaning up >>> 93amd64-default: removed >>> 93amd64-default-n: removed >>> [00:10:55] =3D=3D=3D=3D>> Umounting file systems >>> Write failed: Broken pipe >>> >>> Prior to that, I ran poudriere a number of times with a 10.2-STABLE >>> amd64 jail without incident. >>> >>> I've kicked off a bunch of poudriere runs for other jails and >>> will check on it in the morning. >> >> Died the same way after building ports on the first jail, >> 10.1-RELEASE amd64. >> >> Since there have been some zfs commits since r286923, I upgraded to >> r286998 this morning and tried again with no better luck. I got the >> same panic again. >> >> This machine has mirrored swap, and even though I've done what >> gmirror(8) says to do in order to capture crash dumps, I've had no luc= k >> with that. The dump is getting written, but savecore is unable to fin= d >> it. >=20 > Not sure what is happening with savecore during boot, but I was able to= > run it manually and collect the crash dump. >=20 >=20 > Unread portion of the kernel message buffer: > panic: solaris assert: avl_is_empty(&dn->dn_dbufs), file: /usr/src/sys/= cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, line: 495 > cpuid =3D 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085= 9e4e4e0 > vpanic() at vpanic+0x189/frame 0xfffffe0859e4e560 > panic() at panic+0x43/frame 0xfffffe0859e4e5c0 > assfail() at assfail+0x1a/frame 0xfffffe0859e4e5d0 > dnode_sync() at dnode_sync+0x6c8/frame 0xfffffe0859e4e6b0 > dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x2b/frame 0xfffffe0= 859e4e6e0 > dmu_objset_sync() at dmu_objset_sync+0x29e/frame 0xfffffe0859e4e7b0 > dsl_pool_sync() at dsl_pool_sync+0x348/frame 0xfffffe0859e4e820 > spa_sync() at spa_sync+0x442/frame 0xfffffe0859e4e910 > txg_sync_thread() at txg_sync_thread+0x23d/frame 0xfffffe0859e4e9f0 > fork_exit() at fork_exit+0x84/frame 0xfffffe0859e4ea30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0859e4ea30 > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > KDB: enter: panic >=20 >=20 >=20 > (kgdb) bt > #0 doadump (textdump=3D0) at pcpu.h:221 > #1 0xffffffff8037bb86 in db_fncall (dummy1=3D,=20 > dummy2=3D, dummy3=3D,=20 > dummy4=3D) at /usr/src/sys/ddb/db_command.c:56= 8 > #2 0xffffffff8037b941 in db_command (cmd_table=3D0x0) > at /usr/src/sys/ddb/db_command.c:440 > #3 0xffffffff8037b5d4 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:493 > #4 0xffffffff8037e18b in db_trap (type=3D, code=3D= 0) > at /usr/src/sys/ddb/db_main.c:251 > #5 0xffffffff80a5b294 in kdb_trap (type=3D3, code=3D0, tf=3D) > at /usr/src/sys/kern/subr_kdb.c:654 > #6 0xffffffff80e6a4b1 in trap (frame=3D0xfffffe0859e4e410) > at /usr/src/sys/amd64/amd64/trap.c:540 > #7 0xffffffff80e49f22 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:235 > #8 0xffffffff80a5a96e in kdb_enter (why=3D0xffffffff81379010 "panic", = > msg=3D0xffffffff80a60b60 "UH\211\ufffdAWAVATSH\203\ufffdPI\211\ufff= dA\211\ufffdH\213\004%Py\ufffd\201H\211E\ufffd\201<%\ufffd\210\ufffd\201"= ) at cpufunc.h:63 > #9 0xffffffff80a1e2c9 in vpanic (fmt=3D,=20 > ap=3D) at /usr/src/sys/kern/kern_shutdown.c:61= 9 > #10 0xffffffff80a1e333 in panic (fmt=3D0xffffffff81aafa90 "\004") > at /usr/src/sys/kern/kern_shutdown.c:557 > ---Type to continue, or q to quit--- > #11 0xffffffff8240922a in assfail (a=3D,=20 > f=3D, l=3D) > at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:= 81 > #12 0xffffffff820d4f78 in dnode_sync (dn=3D0xfffff8040b72d3d0,=20 > tx=3D0xfffff8001598ec00) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sy= nc.c:495 > #13 0xffffffff820c922b in dmu_objset_sync_dnodes (list=3D0xfffff8000771= 2b90,=20 > newlist=3D, tx=3D) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objs= et.c:1045 > #14 0xffffffff820c8ede in dmu_objset_sync (os=3D0xfffff80007712800,=20 > pio=3D, tx=3D0xfffff8001598ec00) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objs= et.c:1163 > #15 0xffffffff820e8e78 in dsl_pool_sync (dp=3D0xfffff80015676000, txg=3D= 2660975) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool= =2Ec:536 > #16 0xffffffff8210dca2 in spa_sync (spa=3D0xfffffe00089c6000, txg=3D266= 0975) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:66= 41 > #17 0xffffffff8211843d in txg_sync_thread (arg=3D0xfffff80015676000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:51= 7 > #18 0xffffffff809e47c4 in fork_exit ( > callout=3D0xffffffff82118200 , arg=3D0xfffff800156= 76000,=20 > frame=3D0xfffffe0859e4ea40) at /usr/src/sys/kern/kern_fork.c:1006 > ---Type to continue, or q to quit--- > #19 0xffffffff80e4a45e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:610 > #20 0x0000000000000000 in ?? () > Current language: auto; currently minimal >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die --0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.7 (FreeBSD) iQIcBAEBCgAGBQJV13AOAAoJEJW2GBstM+nsz+YP/jkHa9U0NfBa3bD4mH7xSl+o umRX2nHmzu0qH+D/csa0Qj427+pwPnCZdpuGgI9OI6N52eSFm8G4oBkF4lDgTLJE tNXpxaD6kvKsN9Xo/kv1CwreNk/QhatL+i7JR5Ozc0gZ7R3sJwenbjtLVB5L8JuB Q1lXA/bbc1gfwqdZ73NWkYNHUA2gYkUypL3O9YQC1YbFcPrNDLbXZVRlBw0J1pqL D2Rb7d6aztut2WHK0VE8rbnheBHT5gphuqwOGhNYut0DmPxFSgO6wJ78LHlCQ3te 5Ga7TVEA15ByWzVqqoTGfDP/Mze76OvB5I6wzHbKq9lZiFh5pFoPiBXgqBJKMwE5 UEkgmxoSR+HJ+/FL07rXBf/NwHHPzx10D7cI8S7Q1ZdJI/oOrCna9yAMfyrqO1BH taKoNJqgZqPVhlB0rQB47Cnhp4pBKf6AkrK4TG7Oqrvko2jJt1aIEBfxaauhc+Ar tH3CGy2hzP3/mMm0MSZ63g3m9jV8fVQdKV0yg9E2V5KoCbG7d9i5MftC7oz3kM7u pSUMBzWlDv7zkkJ8xErIA22AEhjLVXQ/5+Va5m4Dbdb6le+Jk0Wl/ncMJ+XigUPZ /cNfVPv2JlHArPNVgjOEJ4EI6spXR0xTaOxnyGefxCd5ONIvpnLeTaurzPGLF7S5 jSyz3pBLOTWnUDWMFyBo =AKkZ -----END PGP SIGNATURE----- --0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR--