From owner-freebsd-geom@FreeBSD.ORG Fri Jan 5 14:38:43 2007 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 901C016A40F for ; Fri, 5 Jan 2007 14:38:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5022813C46A for ; Fri, 5 Jan 2007 14:38:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 07EE94C603; Fri, 5 Jan 2007 09:14:27 -0500 (EST) Date: Fri, 5 Jan 2007 14:14:26 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: lulf@stud.ntnu.no In-Reply-To: <20070105015800.s3rqdzgm8k8owk4s@webmail.ntnu.no> Message-ID: <20070105140941.B98541@fledge.watson.org> References: <20070105015800.s3rqdzgm8k8owk4s@webmail.ntnu.no> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Pluggable Disk Schedulers in GEOM X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jan 2007 14:38:43 -0000 On Fri, 5 Jan 2007, lulf@stud.ntnu.no wrote: > Anyway, I'd like to research a bit on this topic to just see how much it > does matter with different I/O scheduling for different purposes. I think working on this is interesting, but the one caution I'd have is that it's possibly to introduce serious priority inversions through any complex scheduling scheme for I/O. In our VFS, I/O is frequently performed while holding locks or things that act like locks -- for example, during a directory lookup, while pulling an inode off the disk, etc. The I/O will be initiated by one thread, but then other threads will end up waiting for it also. If there is a naive mapping of initiating thread priority to I/O request priority, then you can end up with high priority threads being blocked on a low priority tasks, leading to nasty starvation effects, especially if the scheduler allows indefinite waiting for I/O at a low priority. This, at a rough approximation, is the problem that Kirk ran into when trying to rate limit bgfsck I/O in the kernel: key vnode locks, such as directory vnode locks, would be held across de-prioritized I/O, and high priority processes would then block on the vnode locks. There are various ways to address this, not least priority propagation (in which I/O priority is increased to match the priority of the highest priority thread waiting on the I/O request), but I wanted to make sure you had it on the list of design concerns. Robert N M Watson Computer Laboratory University of Cambridge