From owner-freebsd-current@FreeBSD.ORG Sun May 30 21:21:45 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C6B3116A4CE for ; Sun, 30 May 2004 21:21:45 -0700 (PDT) Received: from smtp01.syd.iprimus.net.au (smtp01.syd.iprimus.net.au [210.50.30.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FC1843D58 for ; Sun, 30 May 2004 21:21:45 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from robbins.dropbear.id.au (210.50.202.23) by smtp01.syd.iprimus.net.au (7.0.024) id 40B7A0DA0008ED88; Mon, 31 May 2004 14:21:43 +1000 Received: by robbins.dropbear.id.au (Postfix, from userid 1000) id 0515D41D0; Mon, 31 May 2004 14:22:34 +1000 (EST) Date: Mon, 31 May 2004 14:22:34 +1000 From: Tim Robbins To: Don Bowman Message-ID: <20040531042234.GA13724@cat.robbins.dropbear.id.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: "'current@freebsd.org'" Subject: Re: sysctl lock, system lockup X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 May 2004 04:21:45 -0000 On Sun, May 30, 2004 at 11:31:14PM -0400, Don Bowman wrote: > From: Tim Robbins [mailto:tim@robbins.dropbear.id.au] > > On Sun, May 30, 2004 at 10:18:34PM -0400, Don Bowman wrote: > > > From: Tim Robbins [mailto:tim@robbins.dropbear.id.au] > > > > On Sun, May 30, 2004 at 04:35:55PM -0400, Don Bowman wrote: > > > > > From: Don Bowman [mailto:don@sandvine.com] > > > > > > On the console i ran 'top', but it wouldn't start, > > > > > > giving: > > > > > > > > > > > > load: 0.00 cmd: top 4282 [sysctl lock] 0.00u 0.00s 0% 180k > > > > > > > > > > > > as the status. I can't ^C it, can't ssh in. > > > > > > can still ping the device. > > > > > > > > > > > > It was doing a backgound fsck from an earlier hang. > > > > > > > > > > > > i have called panic from db, not sure if the core will > > > > > > work properly or not. > > > > > > > > > > As a followup... i did get a vmcore, and matching kernel.debug, > > > > > if someone can suggest what i might look @? > > > > > > > > print sysctllock (or just sysctllock.sx_xholder if you > > don't have a > > > > serial console set up.) > > > > > > (kgdb) print sysctllock > > > $1 = {sx_object = {lo_class = 0xc070dacc, lo_name = > > 0xc06ce43d "sysctl > > > lock", > > > lo_type = 0xc06ce43d "sysctl lock", lo_flags = 3866624, > > lo_list = { > > > tqe_next = 0xc074f9e0, tqe_prev = 0xc0747ab0}, lo_witness = > > > 0xc0751410}, > > > sx_lock = 0xc0748e80, sx_cnt = -1, sx_shrd_cv = { > > > cv_description = 0xc06ce43d "sysctl lock", cv_waiters = 0}, > > > sx_shrd_wcnt = 0, sx_excl_cv = {cv_description = > > 0xc06ce43d "sysctl lock", > > > > > > cv_waiters = 9}, sx_excl_wcnt = 9, sx_xholder = 0xc8ee2150} > > > > Hmm. How about the value of sysctllock.sx_xholder->td_proc? > > Then, if possible, > > switch to that process (with gdb's proc command) and try to > > get a backtrace. > > (I admit to not having used this feature recently; I'm not > > completely sure > > that it still works. You may need to pass it a thread pointer > > instead.) > > > (kgdb) p sysctllock.sx_xholder->td_proc > $1 = (struct proc *) 0xc8eddc08 > (kgdb) proc 0xc8eddc08 > (kgdb) bt > #0 0xc0550340 in sched_switch (td=0xc8ee2150) > at /usr/src/sys/kern/sched_4bsd.c:666 > #1 0xc0545dfe in mi_switch (flags=1945947512) > at /usr/src/sys/kern/kern_synch.c:359 > #2 0xc055d382 in sleepq_switch (wchan=0x0) > at /usr/src/sys/kern/subr_sleepqueue.c:374 > #3 0xc055d53f in sleepq_wait (wchan=0xe15dbc28) > at /usr/src/sys/kern/subr_sleepqueue.c:478 > #4 0xc0545ac6 in msleep (ident=0xe15dbc28, mtx=0xc0774a00, priority=76, > wmesg=0xc06d4ad5 "biord", timo=0) at /usr/src/sys/kern/kern_synch.c:250 > #5 0xc058193f in bwait (bp=0xe15dbc28, pri=76 'L', wchan=0xc06d4ad5 > "biord") > at /usr/src/sys/kern/vfs_bio.c:3766 > #6 0xc0580525 in bufwait (bp=0xe15dbc28) at > /usr/src/sys/kern/vfs_bio.c:3048 > #7 0xc057c9be in breadn (vp=0xc937ba28, blkno=-18688012, size=16384, > rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, bpp=0x0) > at /usr/src/sys/kern/vfs_bio.c:749 > #8 0xc057c724 in bread (vp=0xc937ba28, blkno=-18688012, size=16384, > cred=0x0, > bpp=0xf835e9d8) at /usr/src/sys/kern/vfs_bio.c:684 > #9 0xc061ab93 in ffs_balloc_ufs2 (vp=0xc937ba28, startoffset=0, size=16384, > > cred=0xc53d5180, flags=131072, bpp=0xf835eadc) > at /usr/src/sys/ufs/ffs/ffs_balloc.c:702 > #10 0xc0621191 in ffs_snapremove (vp=0xc937ba28) > at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1463 > #11 0xc0626a70 in softdep_releasefile (ip=0xc9309460) > at /usr/src/sys/ufs/ffs/ffs_softdep.c:3266 > #12 0xc063303d in ufs_inactive (ap=0x0) at > /usr/src/sys/ufs/ufs/ufs_inode.c:88 > #13 0xc063a21f in ufs_vnoperate (ap=0x0) > at /usr/src/sys/ufs/ufs/ufs_vnops.c:2819 > #14 0xc058c60e in vput (vp=0xc937ba28) at vnode_if.h:953 > #15 0xc0618992 in sysctl_ffs_fsck (oidp=0x0, arg1=0xf835ec90, arg2=0, > req=0x0) > at /usr/src/sys/ufs/ffs/ffs_alloc.c:2292 > #16 0xc0547553 in sysctl_root (oidp=0x0, arg1=0xf835ec90, arg2=0, > req=0xf835ec08) at /usr/src/sys/kern/kern_sysctl.c:1220 > #17 0xc0547714 in userland_sysctl (td=0x0, name=0xf835ec84, namelen=3, > old=0xf835ec08, oldlenp=0x0, inkernel=0, new=0x8059f00, newlen=0, > retval=0xf835ec80) at /usr/src/sys/kern/kern_sysctl.c:1317 > #18 0xc05475d5 in __sysctl (td=0xc8ee2150, uap=0xf835ed14) > at /usr/src/sys/kern/kern_sysctl.c:1254 > #19 0xc06813a7 in syscall (frame= > {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 3, tf_esi = 0, > tf_ebp = -1077941560, tf_isp = -130683532, tf_ebx = 1746122828, tf_edx = > 134584952, tf_ecx = 0, tf_eax = 202, tf_trapno = 12, tf_err = 2, tf_eip = > 1745649783, tf_cs = 31, tf_eflags = 658, tf_esp = -1077941620, tf_ss = 47}) > at /usr/src/sys/i386/i386/trap.c:1004 > #20 0x680c8077 in ?? () > Cannot access memory at address 0xbfbfeac8 > (kgdb) I'm not sure where to go from here. A deadlock doesn't seem likely, but it's possible that background fsck could lock up the system for quite some time by using this sysctl. How long did you wait before dropping to ddb (approximately)? Tim