From owner-freebsd-hackers Mon Feb 7 12:56:10 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by builder.freebsd.org (Postfix) with ESMTP id B83C9411A for ; Mon, 7 Feb 2000 12:56:06 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id MAA50540; Mon, 7 Feb 2000 12:56:52 -0800 (PST) (envelope-from dillon) Date: Mon, 7 Feb 2000 12:56:52 -0800 (PST) From: Matthew Dillon Message-Id: <200002072056.MAA50540@apollo.backplane.com> To: Alfred Perlstein Cc: hackers@FreeBSD.ORG Subject: Re: Syncing a vector of fileoffsets and lengths? References: <20000207114042.E25520@fw.wintelcom.net> <200002071938.LAA50114@apollo.backplane.com> <20000207125636.G25520@fw.wintelcom.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I think two kinds of behavior are needed, ordered range fsync and :unordered async fsync. : :The ordered range could be taken care of easily by your implementation, :however for maximum effectiveness you'd want to allow for unordered :async fsync and notification. : :The simplest way I can think of doing this keeping a per-process count :of how many buffers where scheduled for async IO and allowing as many :async ops to happen, incrementing the count, as each io completes it :decrements the count and calls wakeup_one once it reaches 0 again. "Eeek". First, keep in mind that it is not possible to guarentee write ordering after the fact even if we had an interface for it. There are too many other subsystems which might flush a buffer out of order - the page daemon, update daemon, buf daemon, and clustering code, for example. Once the data has been thrown into a filesystem buffer, the game is over. (read the last paragraph for more on this). You can guarentee write ordering in only one place: When you actually issue the write. It should be possible to extend this with the AIO mechanism to handle the necessary synchronous and fsync cases by adding new opcodes, and we can certainly create an AIO call for fsync2, e.g. aio_fsync2(), to handle notification. This would run on top of the fsync2() system call and VOP_FSYNC2() filesystem API. We can add a link pointer dependancy to the aiocb to guarentee commit ordering or even to allow multiple iocb's to be issued in a single system call (and run sequentially). You then issue multiple aio's chained together with dependancies and wait for the last one to complete, then wait for the previous ones to complete (which will not block since you know they've already run once the last one returns). What we do not want to do is to create a whole new kernel notification mechanism *just* for fsync, nor do we want to pollute the argument space up *just* to avoid making multiple system calls. :I think there's enough fields in the struct buf to support this unordered, :i'm not sure it will be possible to do this if the application wants :FIFO async fsync. We aren't going to mess with struct buf. The goal is to simplify struct buf, not complexify it. Dealing with ordering dependancies properly is difficult at best - look at softupdates for example. The chance of getting it right and not introducing new deadlock or runaway bugs in anything under a couple of months is low. We would be letting ourselves in for a world of hurt. -Matt Matthew Dillon :What do you think? : :-Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message