From owner-freebsd-geom@FreeBSD.ORG  Fri Jan  5 14:38:43 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: freebsd-geom@freebsd.org
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 901C016A40F
	for <freebsd-geom@freebsd.org>; Fri,  5 Jan 2007 14:38:43 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 5022813C46A
	for <freebsd-geom@freebsd.org>; Fri,  5 Jan 2007 14:38:43 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 07EE94C603;
	Fri,  5 Jan 2007 09:14:27 -0500 (EST)
Date: Fri, 5 Jan 2007 14:14:26 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: lulf@stud.ntnu.no
In-Reply-To: <20070105015800.s3rqdzgm8k8owk4s@webmail.ntnu.no>
Message-ID: <20070105140941.B98541@fledge.watson.org>
References: <20070105015800.s3rqdzgm8k8owk4s@webmail.ntnu.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-current@freebsd.org, freebsd-geom@freebsd.org
Subject: Re: Pluggable Disk Schedulers in GEOM
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jan 2007 14:38:43 -0000


On Fri, 5 Jan 2007, lulf@stud.ntnu.no wrote:

> Anyway, I'd like to research a bit on this topic to just see how much it 
> does matter with different I/O scheduling for different purposes.

I think working on this is interesting, but the one caution I'd have is that 
it's possibly to introduce serious priority inversions through any complex 
scheduling scheme for I/O.  In our VFS, I/O is frequently performed while 
holding locks or things that act like locks -- for example, during a directory 
lookup, while pulling an inode off the disk, etc.  The I/O will be initiated 
by one thread, but then other threads will end up waiting for it also.  If 
there is a naive mapping of initiating thread priority to I/O request 
priority, then you can end up with high priority threads being blocked on a 
low priority tasks, leading to nasty starvation effects, especially if the 
scheduler allows indefinite waiting for I/O at a low priority.  This, at a 
rough approximation, is the problem that Kirk ran into when trying to rate 
limit bgfsck I/O in the kernel: key vnode locks, such as directory vnode 
locks, would be held across de-prioritized I/O, and high priority processes 
would then block on the vnode locks.  There are various ways to address this, 
not least priority propagation (in which I/O priority is increased to match 
the priority of the highest priority thread waiting on the I/O request), but I 
wanted to make sure you had it on the list of design concerns.

Robert N M Watson
Computer Laboratory
University of Cambridge