Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 May 1998 00:44:24 +0200
From:      "IBS / Andre Oppermann" <andre@pipeline.ch>
To:        freebsd-chat@FreeBSD.ORG
Subject:   Linus finally got it (filesystem issue)
Message-ID:  <3565FFC8.1357A794@pipeline.ch>

next in thread | raw e-mail | index | archive | help
Just FYI (to everyone who was involved in the huge Newsgroup thing some
weeks ago. Terry - do you remember?):

I brought the OMDU vs. UMDU issue up on the Qmail mailing list earlier
this week (with big hints to FreeBSD) and finally Linus got involved
and he had to admit that EXT2FS is broken.

In case you want to see the whole thing go to http://www.ornl.gov/its/
archives/mailing-lists/qmail/1998/05/maillist.html and scroll down
to 'Large installation using NT clients', read this thread and
everything down that has to do with 'filesystem reliability', 'async
metadata', 'ext2fs', 'how incompatibility destroys mail' or 'kernel
patch'. It's worth a look, Linus makes himself a clown.

On Wed, 20 May 1998, Linus Torvalds wrote:
>
> On Wed, 20 May 1998, IBS / Andre Oppermann wrote:
> > > Adding something like "cached_open()" etc is _exactly_ the wrong thing to
> > > do, because that makes the default programs that don't care get the slow
> > > behaviour.
> > 
> > Not necessary, can you spell soft-updates or DOW? It's almost as fast
> > as async if not faster sometimes (and there are things like SGI's XFS).
> 
> I'd love to have an ordered filesystem.
> 
> THAT IS NOT THE ISSUE! The issue is that the fsync() is _always_ safe. 
> Even if the filesystem itself doesn't need it, there's nothing wrong with
> doing the fsync(). And it can help, and _will_ help on ext2. And for all I
> know, it might help on other systems too.
> 
> And it's just five lines of code - and it's not ugly code at that, it's
> perfectly straightforward code that works on everything from FFS to NFS to
> XFS to Ext2fs etc etc etc.

--> later...

On Wed, 20 May 1998, Linus Torvalds wrote:
>
> On Wed, 20 May 1998, IBS / Andre Oppermann wrote:
> > > 
> > > If Dan had added:
> > > 
> > >   (3.5) do a fdirsync() (five lines of code)
> > > 
> > > then he could have added:
> > > 
> > >   This works with FFS. It works with NFS. It works with ext2.
> > > 
> > > In short, what I'm arguing for is to add a trivial code sequence that will
> > > make everybody happy.
> > 
> > Not quite because all the other folks beside Linux have to pay a not-
> > so-small performace penalty. If you consider a usual mailspool directory
> > with thousands of files and lots of simultanious incoming/outgoing and
> > deliver processes running, it's a quite heavy burden for the box.
> 
> Actually, as all the others according to you do the directory operations
> synchronously, then the fsync() on the directory should be a no-op for
> them, and the only overhead is due to three extra system calls. 
> 
> But yes, make it #ifdef __linux__ if this actually shows up on any
> benchmarks. It might be good to have that even if it _doesn't_ show up on
> any benchmarks, because it is an extra hint to the reader ("oh, Linux does
> asynch directory writes, that's why it's there"). So sure, use the ifdef.
> 
> > Someone out there who applies the patch and can give us some numbers
> > (before/after)?
> 
> This will actually suck on Linux-2.0, because nobody has ever actually
> _used_ the feature, even though it's been there forever. The Linux 2.0
> version of fsync() on directories will actually fsync the whole device
> that the directory is on (so it does the right thing, it's just fairly
> slow about it ;).
> 
> But that's ok. It's due to an oversight, and is fixed in my tree (the only
> reason it is fixed is that I actually went back and looked as part of this
> discussion). And _that_ is not a qmail problem, that's my problem. 
[...]
> > > Linux is the OS for _everybody_. Yes, the clowns can play too. I very
> > > explicitly _want_ the clowns who don't do "serious" programming to chose
> > > Linux too - some of the clowning around is what gets you programs like
> > > Quake etc.
> > 
> > Well, I don't think that Quake was programmed by a clown. Clown's are
> > would-be's and me-too guys.
> 
> In this context "clown" is anybody who didn't strictly care about the
> total integrity of his data.
> 
> And yes, I'm personally a clown too when I do most programming. I've got
> my serious projects, but I don't think I've actually ever used "fsync()"
> in any user-space application I've written. And that's ok - it just
> re-inforces my feeling that for _most_ things you don't actually want to
> force the synchronization. So from personal experience I then judge that
> the few cases where you do care you can add the extra fsync()..
> 
>                 Linus

--> later...

On Wed, 20 May 1998, Linus Torvalds wrote:
>
> On Wed, 20 May 1998, IBS / Andre Oppermann wrote:
> > > This will actually suck on Linux-2.0, because nobody has ever actually
> > > _used_ the feature, even though it's been there forever. The Linux 2.0
> > > version of fsync() on directories will actually fsync the whole device
> > > that the directory is on (so it does the right thing, it's just fairly
> > > slow about it ;).
> > 
> > Ouch... that hurts. You mean we have to wait for 2.2 to get reliability
> > and performace? (or switch to FFS if we need it)?
> 
> I don't know how big an issue this really will be. The "fix" is actually a
> two-liner, and works in 2.0.x too, but I don't think I'll make another
> 2.0.x release just for this. 
> 
> If you have your mail partition separate from most other work that is
> going on on the machine, it probably doesn't make all that much of a
> difference. There isn't going to be huge amounts of unsynch'ed data
> anyway, exactly because all the actual mail files have to be fsync'ed
> after being written.
> 
> If it does make a difference, I can send people the two-liner patch to the
> kernel to fix it up to do what it should have done anyway.

Have fun!
-- 
Andre Oppermann

CEO / Geschaeftsfuehrer
Internet Business Solutions Ltd. (AG)
Hardstrasse 235, 8005 Zurich, Switzerland
Fon +41 1 277 75 75 / Fax +41 1 277 75 77
http://www.pipeline.ch    ibs@pipeline.ch

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3565FFC8.1357A794>