Date: Wed, 10 Sep 2003 19:20:02 +0100 From: "Peter Edwards" <pmedwards@eircom.net> To: current@freebsd.org Subject: useful workaround and analysis of vnode-backed md deadlock Message-ID: <20030910182003.BF84744001@mx1.FreeBSD.org>
next in thread | raw e-mail | index | archive | help
There's been few reports of deadlocks in md on the lists recently, and I walked into it trying to generate flash images for my shiny new Soekris box. In particular, A previous mail mentioned something getting stuck in "wdrain": (Message-ID <20030806104332.GA42110@sunbay.com> from ru@freebsd.org) For the impatient, a way I found around the problem was to mount the md-backed filesystems with the "sync" option. I analysed the deadlock a little, and here's a synopsis, in case they're of use to anyone. This down as well as I could, and it appears to be an interaction between three processes. This may (and most likely isn't) the only md deadlock, but once I otherwise leave the backing file alone, I don't experience any problems once I mount the filesystem sync, And, because the underlying filesystem is async, access to the md filesystem isn't painfully slower than normal. 1: One thread is operating on the filesystem. In general, this thread is creating dirty buffers for later processing by the bufdaemon, and also making direct write requests. This doesn't actually participate in the deadlock, but does set the stage for it. 2: The "md" thread, processing requests from (1), attempts to lock the vnode for the underlying md device, in order to fulfill a queued write request on the md device. 3: Meanwhile.... the bufdaemon has kicked in, and is flushing dirty buffers. Some of these are for the files on the md filesystem, some are for the vnode backing the md device itself (actually, I assume that the flushing of the former causes a sudden surge in the latter, as the writes to the md filesystem are converted to writes to the backing vnode) The bufdaemon has locked the md vnode in order to write bufs to it. However, it needs to wait for "runningbufspace", which is designed to limit the number of in-flight async buffer writes. Once the running buffer space exceeds a high threshold, the scheduler is blocked, to be awakened when completed async writes bring it under the low threshold. However, a large chunk of the running buf space is sitting queued for the md thread to process. The md thread can't continue without the vnode lock, so the running buffer space will not fall, and the bufdaemon cannot continue without running buffer space, so will never release the vnode lock. -- Peter Edwards.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030910182003.BF84744001>