From owner-freebsd-current@FreeBSD.ORG Thu Oct 23 12:39:51 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E586816A4DC; Thu, 23 Oct 2003 12:39:51 -0700 (PDT) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 137A343FCB; Thu, 23 Oct 2003 12:39:51 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9NJdoeN007611; Thu, 23 Oct 2003 12:39:50 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200310231939.h9NJdoeN007611@beastie.mckusick.com> To: Brian Fundakowski Feldman In-Reply-To: Your message of "Thu, 16 Oct 2003 15:32:58 EDT." <200310161932.h9GJWwxi017972@green.bikeshed.org> Date: Thu, 23 Oct 2003 12:39:50 -0700 From: Kirk McKusick cc: current@freebsd.org Subject: Re: runningbufspace related lock-ups with md(4)/UFS/SU (PATCH ?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 19:39:52 -0000 I have been able to reproduce your hang on my system and your suggested fix does prevent it. I am going to run some more buffer starvation-type tests on it this week and if they do not cause other problems, I will put in your suggested fix. Kirk McKusick =-=-=-=-=-= To: current@freebsd.org From: Brian Fundakowski Feldman Mime-Version: 1.0 Date: Thu, 16 Oct 2003 15:32:58 -0400 Cc: phk@freebsd.org Subject: runningbufspace related lock-ups with md(4)/UFS/SU (PATCH ?) I'm having problems where the entire system is locking up when using a MD UFS+SoftUpdates partition. I can simply dd if=/dev/zero of=/mnt/foo and in a couple tries it will lock up. When it locks up, buf_daemon (or if that is patched against, syncer) is calling waitrunningbufspace() from a non-B_ASYNC buf call. Because of this, the md(4) ("md0") thread is stuck in "ufs" waiting to receive a lock on the vnode that one of the syncer/flusher daemons has locked, waiting for bufspace to run down. The user program causing the problem is still stuck in "wdrain" because it's also waiting for waitrunningbufspace() to return. In short, everything wants to try to reduce the amount of outstanding buffer space, but nothing moves forward while GEOM/md(4)/what have you are waiting for the daemons to let go of the vnode so they can write out data. Does this scenario make sense? I have fixed it here using the following very simple patch, which disables the implicit waitrunningbufspace() calls so the daemons can't get stuck there. diff -r1.412 vfs_bio.c 73a74,75 > static struct proc *bufdaemonproc; > 889c891,893 < waitrunningbufspace(); --- > if (curthread->td_proc != bufdaemonproc && > curthread->td_proc != updateproc) > waitrunningbufspace(); 2038,2039d2041 < < static struct proc *bufdaemonproc; -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\