From owner-freebsd-stable@FreeBSD.ORG Fri Oct 20 16:05:45 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E14516A403; Fri, 20 Oct 2006 16:05:45 +0000 (UTC) (envelope-from jhein@timing.com) Received: from Daffy.timing.com (w.timing.com [206.168.13.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id E6A1E43D45; Fri, 20 Oct 2006 16:05:44 +0000 (GMT) (envelope-from jhein@timing.com) Received: from gromit.timing.com (gromit.timing.com [206.168.13.209]) by Daffy.timing.com (8.13.1/8.13.1) with ESMTP id k9KG5io6056948; Fri, 20 Oct 2006 10:05:44 -0600 (MDT) (envelope-from jhein@timing.com) Received: from gromit.timing.com (localhost [127.0.0.1]) by gromit.timing.com (8.13.8/8.13.8) with ESMTP id k9KG5ZL7041562; Fri, 20 Oct 2006 10:05:35 -0600 (MDT) (envelope-from jhein@gromit.timing.com) Received: (from jhein@localhost) by gromit.timing.com (8.13.8/8.13.8/Submit) id k9KG5Zn0041559; Fri, 20 Oct 2006 10:05:35 -0600 (MDT) (envelope-from jhein) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17720.62415.274270.378426@gromit.timing.com> Date: Fri, 20 Oct 2006 10:05:35 -0600 From: John E Hein To: John Baldwin In-Reply-To: <200610191044.30768.jhb@freebsd.org> References: <17718.20457.799395.602805@gromit.timing.com> <20061019100442.GN55428@deviant.kiev.zoral.com.ua> <200610191044.30768.jhb@freebsd.org> <17719.56453.21278.746053@gromit.timing.com> X-Mailer: VM 7.19 under Emacs 21.3.1 X-Virus-Scanned: ClamAV version 0.87.1, clamav-milter version 0.87 on Daffy.timing.com X-Virus-Status: Clean Cc: Kostik Belousov , stable@freebsd.org, davidxu@freebsd.org Subject: Re: locked vnode / nfs... requires kill -9 in ddb X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Oct 2006 16:05:45 -0000 John Baldwin wrote at 10:44 -0400 on Oct 19, 2006: > On Thursday 19 October 2006 06:04, Kostik Belousov wrote: > > On Wed, Oct 18, 2006 at 10:01:45AM -0600, John E Hein wrote: > > > 6.2-PRERELEASE from 20061016 RELENG_6 sources. > > > Locked vnodes > > > > > > 0xc6b7bdd0: tag nfs, type VDIR > > > usecount 2, writecount 0, refcount 8 mountedhere 0 > > > flags (VV_ROOT) > > > v_object 0xc9d84108 ref 0 pages 0 > > > lock type nfs: EXCL (count 1) by thread 0xc8adac00 (pid 50746) with 5 > pending > > > fileid 8 fsid 0x300ff06 > > > > > > 50746 50000 49999 600 T+ sh > > > . > > > . > > > db>db> trace 50746 > > > Tracing pid 50746 tid 100231 td 0xc8adac00 > > > sched_switch(c8adac00,0,2) at 0xc05ce0cb = sched_switch+0x173 > > > mi_switch(2,0) at 0xc05c2b0a = mi_switch+0x1ba > > > thread_suspend_check(1,c079e04c,c8adac00,c9206b80,1,...) at 0xc05c722d = > thread_suspend_check+0x191 > > > sleepq_catch_signals(c9206b80) at 0xc05db93f = sleepq_catch_signals+0x103 > > > sleepq_wait_sig(c9206b80) at 0xc05dbd96 = sleepq_wait_sig+0xe > > > msleep(c9206b80,c08a6a40,153,c0813379,0) at 0xc05c2652 = msleep+0x25a > > > nfs_reply(c9206b80,0,c8adac00,4,c7ea7100,...) at 0xc06c33ac = > nfs_reply+0x244 > > > > nfs_request(c6b7bdd0,c6ae2d00,1,c8adac00,c7815280,e8f3488c,e8f34890,e8f34894,c8adac00,e8f348a0) > at 0xc06c40a5 = nfs_request+0x3c1 > > > nfs_getattr(e8f348dc) at 0xc06c912b = nfs_getattr+0x11f > > > VOP_GETATTR_APV(c086c700,e8f348dc) at 0xc07b260c = VOP_GETATTR_APV+0x38 > > > nfsspec_access(e8f34a8c,c6bf7c94,0,e8f349a4,c060ca26,...) at 0xc06cebf1 = > nfsspec_access+0x85 > > > nfs_access(e8f34a8c) at 0xc06c8b7a = nfs_access+0x122 > > > VOP_ACCESS_APV(c086c700,e8f34a8c) at 0xc07b25b0 = VOP_ACCESS_APV+0x38 > > > nfs_lookup(e8f34b18) at 0xc06c96ff = nfs_lookup+0xd3 > > > VOP_LOOKUP_APV(c086c700,e8f34b18) at 0xc07b22f7 = VOP_LOOKUP_APV+0x43 > > > lookup(e8f34c00) at 0xc060ee79 = lookup+0x4c1 > > > namei(e8f34c00) at 0xc060e71a = namei+0x39a > > > kern_stat(c8adac00,806712c,0,e8f34c74) at 0xc061d3cd = kern_stat+0x35 > > > stat(c8adac00,e8f34d04) at 0xc061d37b = stat+0x1b > > > syscall(3b,3b,3b,1,80670ec,...) at 0xc07a9363 = syscall+0x2bf > > > Xint0x80_syscall() at 0xc079456f = Xint0x80_syscall+0x1f > > > --- syscall (188, FreeBSD ELF32, stat), eip = 0x28196477, esp = > 0xbfbfdc1c, ebp = 0xbfbfdcb8 --- > > > db> kill 9 50746 > > > db> c > > > > The nfs_reply is sleeping with the PCATCH set. The question is why SIGTSTP > > does not cause msleep to return with EINTR. > > The problem is in thread_suspend_check(), not the sleepq code. It happened again (triggered by ctrl-z). INVARIANTS & WITNESS provided no help. Is the problem in thread_suspend_check() known? MFC-able from HEAD? I see this diff. I'm not sure it will help, but is there any reason not to try it in 6 (David Xu CC'd since he made this change)? Index: kern_thread.c =================================================================== RCS file: /base/FreeBSD-CVS/src/sys/kern/kern_thread.c,v retrieving revision 1.216.2.6 retrieving revision 1.235 diff -u -p -r1.216.2.6 -r1.235 --- kern_thread.c 2 Sep 2006 17:29:57 -0000 1.216.2.6 +++ kern_thread.c 28 Aug 2006 04:24:51 -0000 1.235 @@ -910,6 +926,10 @@ thread_suspend_check(int return_instead) (p->p_flag & P_SINGLE_BOUNDARY) && return_instead) return (ERESTART); + /* If thread will exit, flush its pending signals */ + if ((p->p_flag & P_SINGLE_EXIT) && (p->p_singlethread != td)) + sigqueue_flush(&td->td_sigqueue); + mtx_lock_spin(&sched_lock); thread_stopped(p); /*