From owner-freebsd-fs@freebsd.org Fri Oct 19 22:22:35 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A2F1DFE90B4 for ; Fri, 19 Oct 2018 22:22:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 2B25C70638 for ; Fri, 19 Oct 2018 22:22:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id E3668FE90B1; Fri, 19 Oct 2018 22:22:34 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C184BFE90AD for ; Fri, 19 Oct 2018 22:22:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 63F8470632 for ; Fri, 19 Oct 2018 22:22:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 97D8C10195 for ; Fri, 19 Oct 2018 22:22:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w9JMMX7i043995 for ; Fri, 19 Oct 2018 22:22:33 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w9JMMXcp043994 for fs@FreeBSD.org; Fri, 19 Oct 2018 22:22:33 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 227784] zfs: Fatal trap 9: general protection fault while in kernel mode on shutdown Date: Fri, 19 Oct 2018 22:22:32 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: panic, regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: markj@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2018 22:22:36 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D227784 Mark Johnston changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |allanjude@FreeBSD.org, | |mav@FreeBSD.org --- Comment #15 from Mark Johnston --- I took at a look at a vmcore provided by wulf@. At the time of the panic, = the kernel was waiting for MOS dnode dbuf evictions to finsh: (kgdb) bt #0 sched_switch (td=3D0xfffff800035d3000, newtd=3D0xfffff800035d2580, flags=3D) at /usr/src/sys/kern/sched_ule.c:2112 #1 0xffffffff806a759f in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:439 #2 0xffffffff806f0d8d in sleepq_switch (wchan=3D0xfffffe008dffe390, pri=3D= 0) at /usr/src/sys/kern/subr_sleepqueue.c:613 #3 0xffffffff806f0c33 in sleepq_wait (wchan=3D0xfffffe008dffe390, pri=3D0)= at /usr/src/sys/kern/subr_sleepqueue.c:692 #4 0xffffffff806381f3 in _cv_wait (cvp=3D0xfffffe008dffe390, lock=3D) at /usr/src/sys/kern/kern_condvar.c:146 #5 0xffffffff8039d5db in spa_evicting_os_wait (spa=3D) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:1= 959 #6 0xffffffff8038ad9b in spa_deactivate (spa=3D0xfffffe008dffe000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1272 #7 0xffffffff80393b88 in spa_evict_all () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:8350 #8 0xffffffff8039dade in spa_fini () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2141 #9 0xffffffff803e6bdc in zfs__fini () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:7109 #10 0xffffffff8069bf86 in kern_reboot (howto=3D16392) at /usr/src/sys/kern/kern_shutdown.c:443 #11 0xffffffff8069bb4a in sys_reboot (td=3D, uap=3D0xfffff800035d33c0) at /usr/src/sys/kern/kern_shutdown.c:280 At this point, the spa_unload() call preceding the spa_deactivate() call had already freed the pool. However, dsl_pool_close() calls dmu_buf_user_evict_wait() after kicking off evictions of top-level director= ies: 452 /*=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 453 * Drop our references from dsl_pool_open().=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 454 *=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 455 * Since we held the origin_sintnap from "syncing" context (wh= ich=20=20 456 * includes pool-opening context), it actually only got a "ref= "=20=20=20=20 457 * and not a hold, so just drop that here.=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 458 */=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 459 if (dp->dp_origin_snap !=3D NULL)=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20 460 dsl_dataset_rele(dp->dp_origin_snap, dp);=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 461 if (dp->dp_mos_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 462 dsl_dir_rele(dp->dp_mos_dir, dp);=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 463 if (dp->dp_free_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 464 dsl_dir_rele(dp->dp_free_dir, dp);=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 465 if (dp->dp_leak_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 466 dsl_dir_rele(dp->dp_leak_dir, dp);=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 467 if (dp->dp_root_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 468 dsl_dir_rele(dp->dp_root_dir, dp); ... 496 dmu_buf_user_evict_wait();=20 Looking a bit at the dbuf: (kgdb) frame 12 #12 0xffffffff8036221c in dsl_dir_evict_async (dbu=3D0xfffff800053da400) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:158 158 spa_async_close(dd->dd_pool->dp_spa, dd); (kgdb) p dd->dd_myname $42 =3D "$ORIGIN", '\000' (kgdb) p dd->dd_parent->dd_myname $43 =3D "u01", '\000' I'm not sure what $ORIGIN is; I guess it's some ZFS metadata. I looked at taskq_wait() in FreeBSD vs. illumos. On FreeBSD it will only wait for currently queued tasks to finish; anything enqueued after the drain starts may not be finished by the time we return. On illumos it looks like taskq_wait() will wait until the queue is completely empty. So, if the asy= nc evictions queue some additional evictions, on FreeBSD we won't recursively wait, and the taskq_wait() will return early. I can't tell if ZFS is making this assumption though. --=20 You are receiving this mail because: You are the assignee for the bug.=