From owner-freebsd-stable@FreeBSD.ORG Sun Jul 7 22:26:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CEE7AFC7 for ; Sun, 7 Jul 2013 22:26:52 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id 5D528197E for ; Sun, 7 Jul 2013 22:26:51 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 8185E5C4ED for ; Mon, 8 Jul 2013 00:26:45 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id Wo37qlfxULha for ; Mon, 8 Jul 2013 00:26:44 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 503625C4EB for ; Mon, 8 Jul 2013 00:26:44 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id AD74750881 for ; Mon, 8 Jul 2013 00:26:43 +0200 (CEST) Message-ID: <51D9EB23.4070505@incore.de> Date: Mon, 08 Jul 2013 00:26:43 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Shutdown hangs on unmount of a gjournaled file system in 8-Stable Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2013 22:26:52 -0000 The problem occurs after an update of 8-stable from r248120 to r252111. Sometimes shutdown hangs: Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...0 0 done All buffers synced. >From the kernel dump I see the deadlock occurs on unmount of a gjournaled file system. Involved are two processes db> ps pid ppid pgrp uid state wmesg wchan cmd 1 0 1 0 SLs mount dr 0xffffff007f7e559c [init] 18 0 0 0 SL suspwt 0xffffff007f7e5364 [g_journal switcher] (kgdb) info threads 158 Thread 100002 (PID=1: init) sched_switch (td=0xffffff000235e8e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 .... 217 Thread 100076 (PID=18: g_journal switcher) sched_switch (td=0xffffff0002bd6000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 (kgdb) thread 158 [Switching to thread 158 (Thread 100002)]#0 sched_switche(td=0xffffff000235e8e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 1932 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xffffff000235e8e0, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff80407836 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff8043e0e2 in sleepq_wait (wchan=0xffffff007f7e559c, pri=80) at /usr/src/sys/kern/subr_sleepqueue.c:613 #3 0xffffffff80407fc6 in _sleep (ident=0xffffff007f7e559c, lock=0xffffff007f7e52f0, priority=, wmesg=0xffffffff8069f595 "mount drain", timo=0) at /usr/src/sys/kern/kern_synch.c:250 #4 0xffffffff8048ee42 in dounmount (mp=0xffffff007f7e52f0, flags=524288, td=) at /usr/src/sys/kern/vfs_mount.c:1266 #5 0xffffffff80493202 in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:3321 #6 0xffffffff803fec69 in boot (howto=) at /usr/src/sys/kern/kern_shutdown.c:428 #7 0xffffffff803fef86 in reboot (td=, uap=0xffffff8000238bb0) at /usr/src/sys/kern/kern_shutdown.c:191 #8 0xffffffff805db1b4 in amd64_syscall (td=0xffffff000235e8e0, traced=0) at subr_syscall.c:114 #9 0xffffffff805c282c in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 (kgdb) f 5 #5 0xffffffff80493202 in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:3321 3321 error = dounmount(mp, MNT_FORCE, td); (kgdb) p mp->mnt_lockref $1=1 (kgdb) f 4 #4 0xffffffff8048ee42 in dounmount (mp=0xffffff007f7e52f0, flags=524288, td=) at /usr/src/sys/kern/vfs_mount.c:1266 1266 error = msleep(&mp->mnt_lockref, MNT_MTX(mp), PVFS, (kgdb) list 1261 if (flags & MNT_FORCE) 1262 mp->mnt_kern_flag |= MNTK_UNMOUNTF; 1263 error = 0; 1264 if (mp->mnt_lockref) { 1265 mp->mnt_kern_flag |= MNTK_DRAINING; 1266 error = msleep(&mp->mnt_lockref, MNT_MTX(mp), PVFS, 1267 "mount drain", 0); 1268 } 1269 MNT_IUNLOCK(mp); 1270 KASSERT(mp->mnt_lockref == 0, (kgdb) thread 217 [Switching to thread 217 (Thread 100076)]#0 sched_switch (td=0xffffff0002bd6000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 1932 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xffffff0002bd6000, newtd=, flags=) at /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff80407836 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff8043e0e2 in sleepq_wait (wchan=0xffffff007f7e5364, pri=159) at /usr/src/sys/kern/subr_sleepqueue.c:613 #3 0xffffffff80407fc6 in _sleep (ident=0xffffff007f7e5364, lock=0xffffff007f7e52f0, priority=, wmesg=0xffffffff806a0813 "suspwt", timo=0) at /usr/src/sys/kern/kern_synch.c:250 #4 0xffffffff804a25f0 in vfs_write_suspend (mp=0xffffff007f7e52f0) at /usr/src/sys/kern/vfs_vnops.c:1277 #5 0xffffffff80c843bd in g_journal_switcher (arg=) at /usr/src/sys/modules/geom/geom_journal/../ ../../geom/journal/g_journal.c:2968 #6 0xffffffff803d326f in fork_exit (callout=0xffffffff80c838e0 , arg=0xffffffff80c8b140, frame=0xffffff8242e68c40) at /usr/src/sys/kern/kern_fork.c:872 #7 0xffffffff805c2a0e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 (kgdb) f 4 #4 0xffffffff804a25f0 in vfs_write_suspend (mp=0xffffff007f7e52f0) at /usr/src/sys/kern/vfs_vnops.c:1277 1277 (void) msleep(&mp->mnt_writeopcount, (kgdb) list 1272 while (mp->mnt_kern_flag & MNTK_SUSPEND) 1273 msleep(&mp->mnt_flag, MNT_MTX(mp), PUSER - 1, "wsuspfs", 0); 1274 mp->mnt_kern_flag |= MNTK_SUSPEND; 1275 mp->mnt_susp_owner = curthread; 1276 if (mp->mnt_writeopcount > 0) 1277 (void) msleep(&mp->mnt_writeopcount, 1278 MNT_MTX(mp), (PUSER - 1)|PDROP, "suspwt", 0); 1279 else 1280 MNT_IUNLOCK(mp); 1281 if ((error = VFS_SYNC(mp, MNT_SUSPEND)) != 0) (kgdb) p mp->mnt_writeopcount $2 = 1 The deadlock can be explained now: pid 1 (init) sleeps on "mount drain" because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because mp->mnt_writeopcount was 1. This setting was done by pid 1 before going to sleep by calling vn_start_write() in dounmount(). I think the reason for this deadlock is the commit r249055 which seems not to be compatible with gjournal. Andreas Longwitz