From owner-freebsd-current@FreeBSD.ORG Tue Jun 17 20:20:53 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6D5437B401 for ; Tue, 17 Jun 2003 20:20:53 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id D616E43FAF for ; Tue, 17 Jun 2003 20:20:52 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5I3KjM7053484; Tue, 17 Jun 2003 20:20:49 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200306180320.h5I3KjM7053484@gw.catspoiler.org> Date: Tue, 17 Jun 2003 20:20:45 -0700 (PDT) From: Don Lewis To: chris@Shenton.Org In-Reply-To: <87smq8jdj7.fsf@PECTOPAH.shenton.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: current@FreeBSD.org Subject: Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepable locks X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jun 2003 03:20:54 -0000 On 17 Jun, Chris Shenton wrote: > Don Lewis writes: > >> If you have another machine and a null modem cable you can redirect the >> system console of the machine to be debugged to a serial port and run >> some comm software on the other machine so that you can capture all the >> output from ddb. > > OK, I'll give that a shot, probably tomorrow. > > >> At the ddb prompt, you can do a "tr" command to get a stack trace, >> which is likely to be very helpful in pointing out the offending >> code. > > Just saw it again, did a tr. From chicken-scratch notes, the last > bits are: > > VOP_GETVOBJECT(...) > do_sendfile(...) > sendfile(...) > syscall(...) > Xint0x80_syscall... > --- syscall( 393, FreeBSD ELF32, sendfile) ... > > The next time it dropped into ddb, same "sendfile" thing. Try the very untested patch below ... > The main services I'm running are qmail, apache, and NFS. Also > tftp, rarpd, lpd, sshd, bootparamd ... oh, well, I guess I'm running > a bunch of stuff here. :-( Not sure which one, if any, this would be. > > Unless sendfile() is something in the OS? It's a system call, and I believe apache uses it. > > I'll have to dig up a nullmodem and grab console output. I realise > I'm not giving enough detailed info to be very helpful here. It's good enough to squash one bug. I don't know if it will solve your problem, though. >> If you are running the NFS *client* code on this machine, there is one >> lock assertion that is easy to trigger. > > In my kernel config I have this, because a diskless box uses the same > kernel, but my /etc/fstab doesn't mount anyone else's NFS exports. You won't trigger the the lock violation in the NFS client code unless you actually mount a file system from another machine using NFS and actually do some I/O on it. Here's the patch: Index: uipc_syscalls.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.150 diff -u -r1.150 uipc_syscalls.c --- uipc_syscalls.c 12 Jun 2003 05:52:09 -0000 1.150 +++ uipc_syscalls.c 18 Jun 2003 03:14:42 -0000 @@ -1775,10 +1775,13 @@ */ if ((error = fgetvp_read(td, uap->fd, &vp)) != 0) goto done; + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); if (vp->v_type != VREG || VOP_GETVOBJECT(vp, &obj) != 0) { error = EINVAL; + VOP_UNLOCK(vp, 0, td); goto done; } + VOP_UNLOCK(vp, 0, td); if ((error = fgetsock(td, uap->s, &so, NULL)) != 0) goto done; if (so->so_type != SOCK_STREAM) {