From owner-freebsd-current@FreeBSD.ORG Fri Jul 12 18:17:56 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C4C50D47; Fri, 12 Jul 2013 18:17:56 +0000 (UTC) (envelope-from ianf@clue.co.za) Received: from zcs04.jnb1.cloudseed.co.za (zcs04.jnb1.cloudseed.co.za [41.154.0.161]) by mx1.freebsd.org (Postfix) with ESMTP id 24A1316C8; Fri, 12 Jul 2013 18:17:56 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs04.jnb1.cloudseed.co.za (Postfix) with ESMTP id 40C072A832EB; Fri, 12 Jul 2013 20:11:41 +0200 (SAST) X-Virus-Scanned: amavisd-new at zcs04.jnb1.cloudseed.co.za Received: from zcs04.jnb1.cloudseed.co.za ([127.0.0.1]) by localhost (zcs04.jnb1.cloudseed.co.za [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IcgW9ocpDz9X; Fri, 12 Jul 2013 20:11:39 +0200 (SAST) Received: from clue.co.za (unknown [41.154.88.19]) by zcs04.jnb1.cloudseed.co.za (Postfix) with ESMTPSA id BFCAE2A83299; Fri, 12 Jul 2013 20:11:38 +0200 (SAST) Received: from localhost ([127.0.0.1] helo=zen) by clue.co.za with esmtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1Uxhoe-0000d9-Sc; Fri, 12 Jul 2013 20:11:36 +0200 To: John Baldwin From: Ian FREISLICH Subject: Re: Filesystem wedges caused by r251446 In-Reply-To: <201307110923.06548.jhb@freebsd.org> References: <201307110923.06548.jhb@freebsd.org> <201307091202.24493.jhb@freebsd.org> X-Attribution: BOFH Date: Fri, 12 Jul 2013 20:11:36 +0200 Message-Id: Cc: Konstantin Belousov , freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 18:17:56 -0000 John Baldwin wrote: > On Thursday, July 11, 2013 6:54:35 am Ian FREISLICH wrote: > > John Baldwin wrote: > > > On Thursday, July 04, 2013 5:03:29 am Ian FREISLICH wrote: > > > > Konstantin Belousov wrote: > > > > > > > > > > Care to provide any useful information ? > > > > > > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers- > > > handbook/kerneldebug-deadlocks.html > > > > > > > > Well, the system doesn't deadlock it's perfectly useable so long > > > > as you don't touch the file that's wedged. A lot of the time the > > > > userland process is unkillable, but often it is killable. How do > > > > I get from from the PID to where the FS is stuck in the kernel? > > > > > > Use kgdb. 'proc ', then 'bt'. > > > > So, I setup a remote kbgd session, but I still can't figure out how > > to get at the information we need. > > > > (kgdb) proc 5176 > > only supported for core file target > > > > In the mean time, I'll just force it to make a core dump from ddb. > > However, I can't reacreate the issue while the mirror (gmirror) is > > rebuilding, so we'll have to wait for that to finish. > > Sorrry, just run 'sudo kgdb' on the box itself. You can inspect the running > kernel without having to stop it. So, this machine's installworld *always* stalls installing clang. The install can be stopped (ctrl-c) leaving behind this process: root 23147 0.0 0.0 9268 1512 1 D 7:51PM 0:00.01 install -s -o root -g wheel -m 555 clang /usr/bin/clang This is the backtrace from gdb. I suspect frame 4. (kgdb) proc 23147 [Switching to thread 117 (Thread 100059)]#0 sched_switch ( td=0xfffffe000c012920, newtd=0x0, flags=) at /usr/src/sys/kern/sched_ule.c:1954 1954 cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal (kgdb) bt #0 sched_switch (td=0xfffffe000c012920, newtd=0x0, flags=) at /usr/src/sys/kern/sched_ule.c:1954 #1 0xffffffff8047539e in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:487 #2 0xffffffff804acbea in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:620 #3 0xffffffff80474ee9 in _sleep (ident=, lock=0xffffffff80a20300, priority=84, wmesg=0xffffffff8071129a "wdrain", sbt=, pr=0, flags=) at /usr/src/sys/kern/kern_synch.c:249 #4 0xffffffff804e6523 in waitrunningbufspace () at /usr/src/sys/kern/vfs_bio.c:564 #5 0xffffffff804e6073 in bufwrite (bp=) at /usr/src/sys/kern/vfs_bio.c:1226 #6 0xffffffff804f05ed in cluster_wbuild (vp=0xfffffe008fec4000, size=32768, start_lbn=136, len=, gbflags=) at /usr/src/sys/kern/vfs_cluster.c:1002 #7 0xffffffff804efbc3 in cluster_write (vp=0xfffffe008fec4000, bp=0xffffff80f83da6f0, filesize=4456448, seqcount=127, gbflags=) at /usr/src/sys/kern/vfs_cluster.c:592 #8 0xffffffff805c1032 in ffs_write (ap=0xffffff8121c81850) at /usr/src/sys/ufs/ffs/ffs_vnops.c:801 #9 0xffffffff8067fe21 in VOP_WRITE_APV (vop=, ---Type to continue, or q to quit--- a=) at vnode_if.c:999 #10 0xffffffff80511eca in vn_write (fp=0xfffffe006a5f7410, uio=0xffffff8121c81a90, active_cred=0x0, flags=, td=0x0) at vnode_if.h:413 #11 0xffffffff8050eb3a in vn_io_fault (fp=0xfffffe006a5f7410, uio=0xffffff8121c81a90, active_cred=0xfffffe006a6ca000, flags=0, td=0xfffffe000c012920) at /usr/src/sys/kern/vfs_vnops.c:983 #12 0xffffffff804b506a in dofilewrite (td=0xfffffe000c012920, fd=5, fp=0xfffffe006a5f7410, auio=0xffffff8121c81a90, offset=, flags=0) at file.h:290 #13 0xffffffff804b4cde in sys_write (td=0xfffffe000c012920, uap=) at /usr/src/sys/kern/sys_generic.c:460 #14 0xffffffff8061807a in amd64_syscall (td=0xfffffe000c012920, traced=0) at subr_syscall.c:134 #15 0xffffffff806017ab in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #16 0x000000000044e75a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) Any other process attempting to for instance stat /usr/bin/clang also gets stuck: root 23198 0.0 0.0 8056 1780 1 D+ 7:58PM 0:00.00 stat /usr/bin/clang #0 sched_switch (td=0xffffffff80a67b20, newtd=0x0, flags=) at /usr/src/sys/kern/sched_ule.c:1954 1954 cpuid = PCPU_GET(cpuid); (kgdb) proc 23198 [Switching to thread 107 (Thread 100131)]#0 sched_switch ( td=0xfffffe000ce9a000, newtd=0x0, flags=) at /usr/src/sys/kern/sched_ule.c:1954 1954 cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal (kgdb) bt #0 sched_switch (td=0xfffffe000ce9a000, newtd=0x0, flags=) at /usr/src/sys/kern/sched_ule.c:1954 #1 0xffffffff8047539e in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:487 #2 0xffffffff804acbea in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:620 #3 0xffffffff804542bc in sleeplk (lk=, flags=2097408, ilk=, wmesg=0xffffffff807017be "ufs", pri=, timo=) at /usr/src/sys/kern/kern_lock.c:226 #4 0xffffffff80453b09 in __lockmgr_args (lk=0xfffffe008fec4068, flags=, ilk=0xfffffe008fec4130, wmesg=0xffffffff807017be "ufs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:672 #5 0xffffffff805c1384 in ffs_lock (ap=0xffffff8121de9460) at lockmgr.h:97 #6 0xffffffff80680f4a in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2084 #7 0xffffffff8050fd33 in _vn_lock (vp=0xfffffe008fec4000, flags=, file=0xffffffff807134e1 "/usr/src/sys/kern/vfs_subr.c", line=2099) at vnode_if.h:859 #8 0xffffffff804ff5aa in vget (vp=0xfffffe008fec4000, flags=2097408, td=0xfffffe000ce9a000) at /usr/src/sys/kern/vfs_subr.c:2099 #9 0xffffffff804f4343 in vfs_hash_get (mp=0xfffffe000c736790, hash=7624857, flags=, td=0xfffffe000ce9a000, vpp=0xffffff8121de9680, fn=0) at /usr/src/sys/kern/vfs_hash.c:88 #10 0xffffffff805bc861 in ffs_vgetf (mp=0xfffffe000c736790, ino=7624857, flags=2097152, vpp=0xffffff8121de9680, ffs_flags=0) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1672 #11 0xffffffff805c8029 in ufs_lookup_ino (vdp=0xfffffe000c869750, vpp=0xffffff8121de99b0, cnp=0xffffff8121de99d8, dd_ino=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:749 #12 0xffffffff8067f1ac in VOP_CACHEDLOOKUP_APV (vop=, a=) at vnode_if.c:197 #13 0xffffffff804ede7f in vfs_cache_lookup (ap=) at vnode_if.h:80 #14 0xffffffff8067f08c in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:129 #15 0xffffffff804f594c in lookup (ndp=0xffffff8121de9958) at vnode_if.h:54 #16 0xffffffff804f5144 in namei (ndp=0xffffff8121de9958) at /usr/src/sys/kern/vfs_lookup.c:292 #17 0xffffffff8050ae85 in kern_statat_vnhook (td=0xfffffe000ce9a000, flag=, fd=0, path=0x0, pathseg=UIO_USERSPACE, sbp=0xffffff8121de9a60, hook=0) at /usr/src/sys/kern/vfs_syscalls.c:2322 #18 0xffffffff8050af70 in sys_lstat (td=0x0, uap=0xffffff8121de9b80) at /usr/src/sys/kern/vfs_syscalls.c:2303 #19 0xffffffff8061807a in amd64_syscall (td=0xfffffe000ce9a000, traced=0) at subr_syscall.c:134 #20 0xffffffff806017ab in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #21 0x00000008009355da in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) We've tried different disks and different servers, and they all fail the same way. It appears peculiar to amd64 since all our i386 seems unaffected. Ian -- Ian Freislich