From owner-freebsd-current@FreeBSD.ORG  Thu Oct 16 12:32:59 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from green.bikeshed.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP
	id D950816A4BF; Thu, 16 Oct 2003 12:32:58 -0700 (PDT)
Received: from green.bikeshed.org (localhost [127.0.0.1])
	by green.bikeshed.org (8.12.10/8.12.9) with ESMTP id h9GJWwcR017976;
	Thu, 16 Oct 2003 15:32:58 -0400 (EDT)
	(envelope-from green@green.bikeshed.org)
Received: from localhost (green@localhost)h9GJWwxi017972;
	Thu, 16 Oct 2003 15:32:58 -0400 (EDT)
Message-Id: <200310161932.h9GJWwxi017972@green.bikeshed.org>
X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4
To: current@FreeBSD.org
From: Brian Fundakowski Feldman <green@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 16 Oct 2003 15:32:58 -0400
Sender: green@green.bikeshed.org
cc: phk@FreeBSD.org
Subject: runningbufspace related lock-ups with md(4)/UFS/SU (PATCH ?)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Oct 2003 19:32:59 -0000

I'm having problems where the entire system is locking up when using a MD 
UFS+SoftUpdates partition.  I can simply dd if=/dev/zero of=/mnt/foo and in 
a couple tries it will lock up.  When it locks up, buf_daemon (or if that is 
patched against, syncer) is calling waitrunningbufspace() from a non-B_ASYNC 
buf call.  Because of this, the md(4) ("md0") thread is stuck in "ufs" 
waiting to receive a lock on the vnode that one of the syncer/flusher 
daemons has locked, waiting for bufspace to run down.  The user program 
causing the problem is still stuck in "wdrain" because it's also waiting for 
waitrunningbufspace() to return.  In short, everything wants to try to 
reduce the amount of outstanding buffer space, but nothing moves forward 
while GEOM/md(4)/what have you are waiting for the daemons to let go of the 
vnode so they can write out data.
Does this scenario make sense?  I have fixed it here using the following 
very simple patch, which disables the implicit waitrunningbufspace() calls
so the daemons can't get stuck there.

diff -r1.412 vfs_bio.c
73a74,75
> static struct proc *bufdaemonproc;
>
889c891,893
<		waitrunningbufspace();
---
>		if (curthread->td_proc != bufdaemonproc &&
>		    curthread->td_proc != updateproc)
>			waitrunningbufspace();
2038,2039d2041
<
< static struct proc *bufdaemonproc;

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\