Date: Fri, 19 Oct 2018 22:22:32 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 227784] zfs: Fatal trap 9: general protection fault while in kernel mode on shutdown Message-ID: <bug-227784-3630-pgX8govX8V@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-227784-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-227784-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D227784 Mark Johnston <markj@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |allanjude@FreeBSD.org, | |mav@FreeBSD.org --- Comment #15 from Mark Johnston <markj@FreeBSD.org> --- I took at a look at a vmcore provided by wulf@. At the time of the panic, = the kernel was waiting for MOS dnode dbuf evictions to finsh: (kgdb) bt #0 sched_switch (td=3D0xfffff800035d3000, newtd=3D0xfffff800035d2580, flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2112 #1 0xffffffff806a759f in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:439 #2 0xffffffff806f0d8d in sleepq_switch (wchan=3D0xfffffe008dffe390, pri=3D= 0) at /usr/src/sys/kern/subr_sleepqueue.c:613 #3 0xffffffff806f0c33 in sleepq_wait (wchan=3D0xfffffe008dffe390, pri=3D0)= at /usr/src/sys/kern/subr_sleepqueue.c:692 #4 0xffffffff806381f3 in _cv_wait (cvp=3D0xfffffe008dffe390, lock=3D<optim= ized out>) at /usr/src/sys/kern/kern_condvar.c:146 #5 0xffffffff8039d5db in spa_evicting_os_wait (spa=3D<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:1= 959 #6 0xffffffff8038ad9b in spa_deactivate (spa=3D0xfffffe008dffe000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1272 #7 0xffffffff80393b88 in spa_evict_all () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:8350 #8 0xffffffff8039dade in spa_fini () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2141 #9 0xffffffff803e6bdc in zfs__fini () at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:7109 #10 0xffffffff8069bf86 in kern_reboot (howto=3D16392) at /usr/src/sys/kern/kern_shutdown.c:443 #11 0xffffffff8069bb4a in sys_reboot (td=3D<optimized out>, uap=3D0xfffff800035d33c0) at /usr/src/sys/kern/kern_shutdown.c:280 At this point, the spa_unload() call preceding the spa_deactivate() call had already freed the pool. However, dsl_pool_close() calls dmu_buf_user_evict_wait() after kicking off evictions of top-level director= ies: 452 /*=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 453 * Drop our references from dsl_pool_open().=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 454 *=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 455 * Since we held the origin_sintnap from "syncing" context (wh= ich=20=20 456 * includes pool-opening context), it actually only got a "ref= "=20=20=20=20 457 * and not a hold, so just drop that here.=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 458 */=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 459 if (dp->dp_origin_snap !=3D NULL)=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20 460 dsl_dataset_rele(dp->dp_origin_snap, dp);=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20 461 if (dp->dp_mos_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 462 dsl_dir_rele(dp->dp_mos_dir, dp);=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 463 if (dp->dp_free_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 464 dsl_dir_rele(dp->dp_free_dir, dp);=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 465 if (dp->dp_leak_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 466 dsl_dir_rele(dp->dp_leak_dir, dp);=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 467 if (dp->dp_root_dir !=3D NULL)=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 468 dsl_dir_rele(dp->dp_root_dir, dp); ... 496 dmu_buf_user_evict_wait();=20 Looking a bit at the dbuf: (kgdb) frame 12 #12 0xffffffff8036221c in dsl_dir_evict_async (dbu=3D0xfffff800053da400) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:158 158 spa_async_close(dd->dd_pool->dp_spa, dd); (kgdb) p dd->dd_myname $42 =3D "$ORIGIN", '\000' <repeats 248 times> (kgdb) p dd->dd_parent->dd_myname $43 =3D "u01", '\000' <repeats 252 times> I'm not sure what $ORIGIN is; I guess it's some ZFS metadata. I looked at taskq_wait() in FreeBSD vs. illumos. On FreeBSD it will only wait for currently queued tasks to finish; anything enqueued after the drain starts may not be finished by the time we return. On illumos it looks like taskq_wait() will wait until the queue is completely empty. So, if the asy= nc evictions queue some additional evictions, on FreeBSD we won't recursively wait, and the taskq_wait() will return early. I can't tell if ZFS is making this assumption though. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-227784-3630-pgX8govX8V>