From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 03:20:06 2007
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 57B4C16A419
	for <hackers@freebsd.org>; Thu, 11 Oct 2007 03:20:06 +0000 (UTC)
	(envelope-from fabio@freebsd.org)
Received: from sssup.it (ms01.sssup.it [193.205.80.99])
	by mx1.freebsd.org (Postfix) with ESMTP id A1F3C13C45D
	for <hackers@freebsd.org>; Thu, 11 Oct 2007 03:20:05 +0000 (UTC)
	(envelope-from fabio@freebsd.org)
Received: from [10.30.3.4] (HELO granpasso.retis)
	by sssup.it (CommuniGate Pro SMTP 4.1.8)
	with SMTP id 34883019 for hackers@freebsd.org;
	Thu, 11 Oct 2007 04:08:48 +0200
Received: (qmail 14861 invoked from network); 11 Oct 2007 02:20:01 -0000
Received: from unknown (HELO granpasso.retis) (127.0.0.1)
	by localhost.retis with SMTP; 11 Oct 2007 02:20:01 -0000
Received: (from fabio@localhost)
	by granpasso.retis (8.14.1/8.14.1/Submit) id l9B2K1uD014859;
	Thu, 11 Oct 2007 04:20:01 +0200 (CEST)
	(envelope-from fabio@freebsd.org)
X-Authentication-Warning: granpasso.retis: fabio set sender to
	fabio@freebsd.org using -f
Date: Thu, 11 Oct 2007 04:20:01 +0200
From: Fabio Checconi <fabio@freebsd.org>
To: hackers@freebsd.org
Message-ID: <20071011022001.GC13480@gandalf.sssup.it>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Cc: s223560@studenti.ing.unipi.it, netchild@freebsd.org, joel@freebsd.org
Subject: Pluggable Disk Scheduler Project
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2007 03:20:06 -0000

Hi,
    is anybody working on the `Pluggable Disk Scheduler Project' from
the ideas page?

To better understand how GEOM works, and how a (non work conserving)
disk scheduler can fit into it, I've written a very simple, yet
working, prototype:

    http://feanor.sssup.it/~fabio/freebsd/g_sched/geom-sched-class.patch


I'd like to take a better look at the problem and work on it, and
I'd like to know what you think about it.

After reading [1], [2] and its follow-ups the main problems that
need to be addressed seem to be:

    o is working on disk scheduling worth at all?
    o Where is the right place (in GEOM) for a disk scheduler?
    o How can anticipation be introduced into the GEOM framework?
    o What can be an interface for disk schedulers?
    o How to deal with devices that handle multiple request per time?
    o How to deal with metadata requests and other VFS issues?

I think that some answers need a little bit of experimenting with
real code and real hardware, so here it is this attempt.  The
interface used in this toy prototype for the scheduler is something
like that:

    typedef void *gs_init_t (struct g_geom *geom);
    typedef void gs_fini_t (void *data);
    typedef void gs_start_t (void *data, struct bio *bio);
    typedef void gs_done_t (void *data, struct bio *bio);

    struct g_gsched {
	    const char	*gs_name;	/* Scheduler name. */
	    int		gs_refs;	/* Refcount, internal use. */

	    gs_init_t	*gs_init;	/* Called on geom creation. */
	    gs_fini_t	*gs_fini;	/* Called on geom destruction. */
	    gs_start_t	*gs_start;	/* Called on geom start. */
	    gs_done_t	*gs_done;	/* Called on geom done. */

	    LIST_ENTRY(g_gsched) glist;	/* List of schedulers, internal use. */
    };

The main idea is to allow the scheduler to enqueue the requests having only
one (other small fixed numbers can be better on some hardware) outstanding
request and to pass new requests to its provider only after the service of
the previous one ended.

The example scheduler in the draft takes the following approach:

    o a scheduling GEOM class is introduced.  It can be stacked on
      top of disk geoms, and schedules all the requests coming
      from its consumers.  I'm not absolutely sure that a new class
      is really needed but I think that it can simplify testing and
      experimenting with various solutions on the scheduler placement.
    o  Requests coming from consumers are passed down immediately
      if there is no other request under service, otherwise they
      are queued in a bioq.
    o  When a request is served the scheduler is notified, then it
      can pass down a new request, or, as in this toy anticipatory[3]
      scheduler, wait for a new request from the same process, or
      for a timeout to expire, and only after one of those events
      make the next scheduling decision.

So, as I've said, I'd like to know what you think about the subject,
if I'm missing something, if there is some kind of interest on this
and if/how can this work proceed.

Thanks in advance,
fabio


[1]  http://wiki.freebsd.org/Hybrid

[2]  http://lists.freebsd.org/pipermail/freebsd-geom/2007-January/001854.html

[3]  The details of the anticipation are really not interesting as it
    is extremely simplified by purpose.

[4]  http://feanor.sssup.it/~fabio/freebsd/g_sched/ contains also an userspace
    client to experiment with the GEOM class.