From owner-freebsd-fs@freebsd.org Fri Aug 21 17:48:59 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC08F9BFC99 for ; Fri, 21 Aug 2015 17:48:58 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (cl-1657.chi-02.us.sixxs.net [IPv6:2001:4978:f:678::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7D9F7E01 for ; Fri, 21 Aug 2015 17:48:58 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t7LHmo96096088 for ; Fri, 21 Aug 2015 10:48:54 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201508211748.t7LHmo96096088@gw.catspoiler.org> Date: Fri, 21 Aug 2015 10:48:50 -0700 (PDT) From: Don Lewis Subject: Re: solaris assert: avl_is_empty(&dn -> dn_dbufs) panic To: freebsd-fs@FreeBSD.org In-Reply-To: <201508211734.t7LHYQ33096035@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 17:48:59 -0000 On 21 Aug, Don Lewis wrote: > On 21 Aug, Don Lewis wrote: >> I just started getting this panic: >> >> solaris assert: avl_is_empty(&dn -> dn_dbufs), file: >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, >> line 495 >> >> System info: >> FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #25 r286923: Wed Aug 19 09:28:53 PDT 2015 dl@zipper.catspoiler.org:/usr/obj/usr/src/sys/GENERIC amd64 >> >> My zfs pool has one mirrored vdev. Scrub doesn't find any problems. >> >> %zpool status >> pool: zroot >> state: ONLINE >> scan: scrub repaired 0 in 2h58m with 0 errors on Fri Aug 21 00:44:52 2015 >> config: >> >> NAME STATE READ WRITE CKSUM >> zroot ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> ada0p3 ONLINE 0 0 0 >> ada1p3 ONLINE 0 0 0 >> >> This panic is reproduceable and happens every time I use poudriere to >> build ports using my 9.3-RELEASE amd64 jail and occurs at the end of the >> poudriere run when it is unmounting filesystems. >> >> [00:10:43] ====>> Stopping 4 builders >> 93amd64-default-job-01: removed >> 93amd64-default-job-01-n: removed >> 93amd64-default-job-02: removed >> 93amd64-default-job-02-n: removed >> 93amd64-default-job-03: removed >> 93amd64-default-job-03-n: removed >> 93amd64-default-job-04: removed >> 93amd64-default-job-04-n: removed >> [00:10:46] ====>> Creating pkgng repository >> Creating repository in /tmp/packages: 100% >> Packing files for repository: 100% >> [00:10:55] ====>> Committing packages to repository >> [00:10:55] ====>> Removing old packages >> [00:10:55] ====>> Built ports: devel/py-pymtbl net/sie-nmsg net/p5-Net-Nmsg net/axa >> [93amd64-default] [2015-08-21_00h47m41s] [committing:] Queued: 4 Built: 4 Failed: 0 Skipped: 0 Ignored: 0 Tobuild: 0 Time: 00:10:53 >> [00:10:55] ====>> Logs: /var/poudriere/data/logs/bulk/93amd64-default/2015-08-21_00h47m41s >> [00:10:55] ====>> Cleaning up >> 93amd64-default: removed >> 93amd64-default-n: removed >> [00:10:55] ====>> Umounting file systems >> Write failed: Broken pipe >> >> Prior to that, I ran poudriere a number of times with a 10.2-STABLE >> amd64 jail without incident. >> >> I've kicked off a bunch of poudriere runs for other jails and >> will check on it in the morning. > > Died the same way after building ports on the first jail, > 10.1-RELEASE amd64. > > Since there have been some zfs commits since r286923, I upgraded to > r286998 this morning and tried again with no better luck. I got the > same panic again. > > This machine has mirrored swap, and even though I've done what > gmirror(8) says to do in order to capture crash dumps, I've had no luck > with that. The dump is getting written, but savecore is unable to find > it. Not sure what is happening with savecore during boot, but I was able to run it manually and collect the crash dump. Unread portion of the kernel message buffer: panic: solaris assert: avl_is_empty(&dn->dn_dbufs), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, line: 495 cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0859e4e4e0 vpanic() at vpanic+0x189/frame 0xfffffe0859e4e560 panic() at panic+0x43/frame 0xfffffe0859e4e5c0 assfail() at assfail+0x1a/frame 0xfffffe0859e4e5d0 dnode_sync() at dnode_sync+0x6c8/frame 0xfffffe0859e4e6b0 dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x2b/frame 0xfffffe0859e4e6e0 dmu_objset_sync() at dmu_objset_sync+0x29e/frame 0xfffffe0859e4e7b0 dsl_pool_sync() at dsl_pool_sync+0x348/frame 0xfffffe0859e4e820 spa_sync() at spa_sync+0x442/frame 0xfffffe0859e4e910 txg_sync_thread() at txg_sync_thread+0x23d/frame 0xfffffe0859e4e9f0 fork_exit() at fork_exit+0x84/frame 0xfffffe0859e4ea30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0859e4ea30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic (kgdb) bt #0 doadump (textdump=0) at pcpu.h:221 #1 0xffffffff8037bb86 in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/src/sys/ddb/db_command.c:568 #2 0xffffffff8037b941 in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0xffffffff8037b5d4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0xffffffff8037e18b in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:251 #5 0xffffffff80a5b294 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80e6a4b1 in trap (frame=0xfffffe0859e4e410) at /usr/src/sys/amd64/amd64/trap.c:540 #7 0xffffffff80e49f22 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:235 #8 0xffffffff80a5a96e in kdb_enter (why=0xffffffff81379010 "panic", msg=0xffffffff80a60b60 "UH\211\ufffdAWAVATSH\203\ufffdPI\211\ufffdA\211\ufffdH\213\004%Py\ufffd\201H\211E\ufffd\201<%\ufffd\210\ufffd\201") at cpufunc.h:63 #9 0xffffffff80a1e2c9 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:619 #10 0xffffffff80a1e333 in panic (fmt=0xffffffff81aafa90 "\004") at /usr/src/sys/kern/kern_shutdown.c:557 ---Type to continue, or q to quit--- #11 0xffffffff8240922a in assfail (a=, f=, l=) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #12 0xffffffff820d4f78 in dnode_sync (dn=0xfffff8040b72d3d0, tx=0xfffff8001598ec00) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:495 #13 0xffffffff820c922b in dmu_objset_sync_dnodes (list=0xfffff80007712b90, newlist=, tx=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1045 #14 0xffffffff820c8ede in dmu_objset_sync (os=0xfffff80007712800, pio=, tx=0xfffff8001598ec00) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1163 #15 0xffffffff820e8e78 in dsl_pool_sync (dp=0xfffff80015676000, txg=2660975) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:536 #16 0xffffffff8210dca2 in spa_sync (spa=0xfffffe00089c6000, txg=2660975) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6641 #17 0xffffffff8211843d in txg_sync_thread (arg=0xfffff80015676000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:517 #18 0xffffffff809e47c4 in fork_exit ( callout=0xffffffff82118200 , arg=0xfffff80015676000, frame=0xfffffe0859e4ea40) at /usr/src/sys/kern/kern_fork.c:1006 ---Type to continue, or q to quit--- #19 0xffffffff80e4a45e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:610 #20 0x0000000000000000 in ?? () Current language: auto; currently minimal