From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 11 03:20:06 2007 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57B4C16A419 for ; Thu, 11 Oct 2007 03:20:06 +0000 (UTC) (envelope-from fabio@freebsd.org) Received: from sssup.it (ms01.sssup.it [193.205.80.99]) by mx1.freebsd.org (Postfix) with ESMTP id A1F3C13C45D for ; Thu, 11 Oct 2007 03:20:05 +0000 (UTC) (envelope-from fabio@freebsd.org) Received: from [10.30.3.4] (HELO granpasso.retis) by sssup.it (CommuniGate Pro SMTP 4.1.8) with SMTP id 34883019 for hackers@freebsd.org; Thu, 11 Oct 2007 04:08:48 +0200 Received: (qmail 14861 invoked from network); 11 Oct 2007 02:20:01 -0000 Received: from unknown (HELO granpasso.retis) (127.0.0.1) by localhost.retis with SMTP; 11 Oct 2007 02:20:01 -0000 Received: (from fabio@localhost) by granpasso.retis (8.14.1/8.14.1/Submit) id l9B2K1uD014859; Thu, 11 Oct 2007 04:20:01 +0200 (CEST) (envelope-from fabio@freebsd.org) X-Authentication-Warning: granpasso.retis: fabio set sender to fabio@freebsd.org using -f Date: Thu, 11 Oct 2007 04:20:01 +0200 From: Fabio Checconi To: hackers@freebsd.org Message-ID: <20071011022001.GC13480@gandalf.sssup.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: s223560@studenti.ing.unipi.it, netchild@freebsd.org, joel@freebsd.org Subject: Pluggable Disk Scheduler Project X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Oct 2007 03:20:06 -0000 Hi, is anybody working on the `Pluggable Disk Scheduler Project' from the ideas page? To better understand how GEOM works, and how a (non work conserving) disk scheduler can fit into it, I've written a very simple, yet working, prototype: http://feanor.sssup.it/~fabio/freebsd/g_sched/geom-sched-class.patch I'd like to take a better look at the problem and work on it, and I'd like to know what you think about it. After reading [1], [2] and its follow-ups the main problems that need to be addressed seem to be: o is working on disk scheduling worth at all? o Where is the right place (in GEOM) for a disk scheduler? o How can anticipation be introduced into the GEOM framework? o What can be an interface for disk schedulers? o How to deal with devices that handle multiple request per time? o How to deal with metadata requests and other VFS issues? I think that some answers need a little bit of experimenting with real code and real hardware, so here it is this attempt. The interface used in this toy prototype for the scheduler is something like that: typedef void *gs_init_t (struct g_geom *geom); typedef void gs_fini_t (void *data); typedef void gs_start_t (void *data, struct bio *bio); typedef void gs_done_t (void *data, struct bio *bio); struct g_gsched { const char *gs_name; /* Scheduler name. */ int gs_refs; /* Refcount, internal use. */ gs_init_t *gs_init; /* Called on geom creation. */ gs_fini_t *gs_fini; /* Called on geom destruction. */ gs_start_t *gs_start; /* Called on geom start. */ gs_done_t *gs_done; /* Called on geom done. */ LIST_ENTRY(g_gsched) glist; /* List of schedulers, internal use. */ }; The main idea is to allow the scheduler to enqueue the requests having only one (other small fixed numbers can be better on some hardware) outstanding request and to pass new requests to its provider only after the service of the previous one ended. The example scheduler in the draft takes the following approach: o a scheduling GEOM class is introduced. It can be stacked on top of disk geoms, and schedules all the requests coming from its consumers. I'm not absolutely sure that a new class is really needed but I think that it can simplify testing and experimenting with various solutions on the scheduler placement. o Requests coming from consumers are passed down immediately if there is no other request under service, otherwise they are queued in a bioq. o When a request is served the scheduler is notified, then it can pass down a new request, or, as in this toy anticipatory[3] scheduler, wait for a new request from the same process, or for a timeout to expire, and only after one of those events make the next scheduling decision. So, as I've said, I'd like to know what you think about the subject, if I'm missing something, if there is some kind of interest on this and if/how can this work proceed. Thanks in advance, fabio [1] http://wiki.freebsd.org/Hybrid [2] http://lists.freebsd.org/pipermail/freebsd-geom/2007-January/001854.html [3] The details of the anticipation are really not interesting as it is extremely simplified by purpose. [4] http://feanor.sssup.it/~fabio/freebsd/g_sched/ contains also an userspace client to experiment with the GEOM class.