From owner-freebsd-current@FreeBSD.ORG Wed Sep 10 11:20:06 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7ADBF16A4BF for ; Wed, 10 Sep 2003 11:20:06 -0700 (PDT) Received: from mail03.svc.cra.dublin.eircom.net (mail03.svc.cra.dublin.eircom.net [159.134.118.19]) by mx1.FreeBSD.org (Postfix) with SMTP id BF84744001 for ; Wed, 10 Sep 2003 11:20:03 -0700 (PDT) (envelope-from pmedwards@eircom.net) Received: (qmail 57190 messnum 218254 invoked from network[159.134.237.77/webmail00.eircom.net]); 10 Sep 2003 18:20:02 -0000 Received: from webmail00.eircom.net (HELO webmail.eircom.net) (159.134.237.77) by mail03.svc.cra.dublin.eircom.net (qp 57190) with SMTP; 10 Sep 2003 18:20:02 -0000 From: "Peter Edwards" To: current@freebsd.org Date: Wed, 10 Sep 2003 19:20:02 +0100 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Originating-IP: 194.125.180.3 X-Mailer: Eircom Net CRC Webmail (http://www.eircom.net/) Organization: Eircom Net (http://www.eircom.net/) Message-Id: <20030910182003.BF84744001@mx1.FreeBSD.org> Subject: useful workaround and analysis of vnode-backed md deadlock X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Sep 2003 18:20:06 -0000 There's been few reports of deadlocks in md on the lists recently, and I walked into it trying to generate flash images for my shiny new Soekris box. In particular, A previous mail mentioned something getting stuck in "wdrain": (Message-ID <20030806104332.GA42110@sunbay.com> from ru@freebsd.org) For the impatient, a way I found around the problem was to mount the md-backed filesystems with the "sync" option. I analysed the deadlock a little, and here's a synopsis, in case they're of use to anyone. This down as well as I could, and it appears to be an interaction between three processes. This may (and most likely isn't) the only md deadlock, but once I otherwise leave the backing file alone, I don't experience any problems once I mount the filesystem sync, And, because the underlying filesystem is async, access to the md filesystem isn't painfully slower than normal. 1: One thread is operating on the filesystem. In general, this thread is creating dirty buffers for later processing by the bufdaemon, and also making direct write requests. This doesn't actually participate in the deadlock, but does set the stage for it. 2: The "md" thread, processing requests from (1), attempts to lock the vnode for the underlying md device, in order to fulfill a queued write request on the md device. 3: Meanwhile.... the bufdaemon has kicked in, and is flushing dirty buffers. Some of these are for the files on the md filesystem, some are for the vnode backing the md device itself (actually, I assume that the flushing of the former causes a sudden surge in the latter, as the writes to the md filesystem are converted to writes to the backing vnode) The bufdaemon has locked the md vnode in order to write bufs to it. However, it needs to wait for "runningbufspace", which is designed to limit the number of in-flight async buffer writes. Once the running buffer space exceeds a high threshold, the scheduler is blocked, to be awakened when completed async writes bring it under the low threshold. However, a large chunk of the running buf space is sitting queued for the md thread to process. The md thread can't continue without the vnode lock, so the running buffer space will not fall, and the bufdaemon cannot continue without running buffer space, so will never release the vnode lock. -- Peter Edwards.