From owner-freebsd-current@FreeBSD.ORG  Tue Jun 17 20:02:17 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EAD8B37B401
	for <current@FreeBSD.org>; Tue, 17 Jun 2003 20:02:17 -0700 (PDT)
Received: from Shenton.org (23.ebbed1.client.atlantech.net [209.190.235.35])
	by mx1.FreeBSD.org (Postfix) with SMTP id 9C17543FBF
	for <current@FreeBSD.org>; Tue, 17 Jun 2003 20:02:15 -0700 (PDT)
	(envelope-from chris@Shenton.Org)
Received: (qmail 36410 invoked by uid 1000); 18 Jun 2003 03:00:12 -0000
To: Don Lewis <truckman@FreeBSD.org>
References: <200306180233.h5I2WxM7053350@gw.catspoiler.org>
From: Chris Shenton <chris@shenton.org>
Date: 17 Jun 2003 23:00:12 -0400
In-Reply-To: <200306180233.h5I2WxM7053350@gw.catspoiler.org>
Message-ID: <87smq8jdj7.fsf@PECTOPAH.shenton.org>
Lines: 75
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
cc: current@FreeBSD.org
Subject: Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepable
	locks
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Jun 2003 03:02:18 -0000

Don Lewis <truckman@FreeBSD.org> writes:

> If you have another machine and a null modem cable you can redirect the
> system console of the machine to be debugged to a serial port and run
> some comm software on the other machine so that you can capture all the
> output from ddb.

OK, I'll give that a shot, probably tomorrow.


> At the ddb prompt, you can do a "tr" command to get a stack trace,
> which is likely to be very helpful in pointing out the offending
> code.

Just saw it again, did a tr.  From chicken-scratch notes, the last
bits are:

  VOP_GETVOBJECT(...)
  do_sendfile(...)
  sendfile(...)
  syscall(...)
  Xint0x80_syscall...
  --- syscall( 393, FreeBSD ELF32, sendfile) ...

The next time it dropped into ddb, same "sendfile" thing.

The main services I'm running are qmail, apache, and NFS.  Also 
tftp, rarpd, lpd, sshd, bootparamd ...  oh, well, I guess I'm running
a bunch of stuff here. :-(  Not sure which one, if any, this would be.

Unless sendfile() is something in the OS?


I'll have to dig up a nullmodem and grab console output.  I realise
I'm not giving enough detailed info to be very helpful here.


> If you are running the NFS *client* code on this machine, there is one
> lock assertion that is easy to trigger. 

In my kernel config I have this, because a diskless box uses the same
kernel, but my /etc/fstab doesn't mount anyone else's NFS exports.

options 	NFSCLIENT		#Network Filesystem Client

chris@PECTOPAH<101> ps -axww|grep nfs
   42  ??  IL     0:00.00  (nfsiod 0)
   43  ??  IL     0:00.00  (nfsiod 1)
   44  ??  IL     0:00.00  (nfsiod 2)
   45  ??  IL     0:00.00  (nfsiod 3)
  428  ??  Is     0:00.03 nfsd: master (nfsd)
  429  ??  I      0:00.09 nfsd: server (nfsd)
  430  ??  I      0:00.00 nfsd: server (nfsd)
  431  ??  I      0:00.00 nfsd: server (nfsd)
  432  ??  I      0:00.00 nfsd: server (nfsd)
35366  p0  R+     0:00.00 grep nfs

> At the ddb prompt you should be able to use the write command tweak a
> couple of variables to modify this behavior.  If you set the
> vfs_badlock_panic variable to zero, the kernel will no longer drop into
> DDB when one of these lock violations occurs.  If you set the
> vfs_badlock_print variable to zero, the kernel will stop printing the
> warnings.

OK, I've done a

  examine vfs_badlock_panic

which shows it zero, then

  write vfs_badlock_panic 0

at least for now.

Thanks again.