From owner-freebsd-arch@FreeBSD.ORG Mon Mar 14 08:00:24 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C49816A4CF for ; Mon, 14 Mar 2005 08:00:24 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF90143D3F for ; Mon, 14 Mar 2005 08:00:23 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2E80Md4023990 for ; Mon, 14 Mar 2005 03:00:22 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2E80MR6023985 for ; Mon, 14 Mar 2005 03:00:22 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Mon, 14 Mar 2005 03:00:22 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Message-ID: <20050314024439.G20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: filesystem suspension. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2005 08:00:24 -0000 The current filesystem suspension mechanism suffers from a few asthetic and functional problems. I've been talking with Kirk about ways we could replace it, and I'd like to propose a few of those ideas here to see if anyone has useful criticism. First, I'll briefly outline the problems. There is the obvious problem of the rather cumbersome and error prone addition of vn_start_write calls wherever you may write to the filesystem. I keep finding places where they were not added when new code came in, or were originally lacking. It's just yet another call you have to remember to make when dealing with vfs. Furthermore, there is a real problem with vput(), which may cause VOP_INACTIVE to be called, which may truncate. To solve this, we vn_start_write from within VOP_INACTIVE after we already have a lock held. This is actually a lock order reversal, as the file system suspension acts as a real lock. rwatson has reported seeing this deadlock on a real system. To solve this, we could do the INACTIVE from another thread which can call vn_start_write before relocking the vnode, but this would serialize all file deletions! I considered other mechanisms for this as well, but they all have similar problems. I have two basic proposals. One is to handle all suspension from within ffs's VOP_LOCK routine, the other is to handle all suspension from within every vop that may write. The ffs_lock method would move the suspension barrier into the ffs_lock routine. A thread would not be suspended if it already held a lockmgr lock, and in this way it would be allowed to continue without leaving any datastructures in an inconsistent state. The suspension would proceed once there were no outstanding ufs locks and all new callers would block in ffs_lock. This requires the least effort as virtually all of the code would be in ffs_lock and unlock. It would however prevent threads from issuing read only calls for the duration of the suspension. My second proposal involves gaiting threads within the actual writing VOPs. This would be similar to the vn_start_write mechanism, but it would be contained entirely within ffs/ufs. The big difference would be that some threads would be suspended while holding locks so the snapshot would have to run lockless, which could be done safely, or by using a special locking protocol, like allowing it to recursively acquire locks that are already held. This would allow most read-only VOPs to continue, unless they attempted to lock a vnode which was suspended in a writing vop. Comments? Other proposals? I'd like to get this sorted out for 6.0. I may come up with some interim solution for RELENG_5 because the vrele problem has caused deadlocks there. Thanks, Jeff