Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 May 1998 04:15:34 +0200
From:      Eivind Eklund <eivind@yes.no>
To:        IBS / Andre Oppermann <andre@pipeline.ch>, freebsd-chat@FreeBSD.ORG
Subject:   Re: Linus finally got it (filesystem issue)
Message-ID:  <19980523041534.53692@follo.net>
In-Reply-To: <3565FFC8.1357A794@pipeline.ch>; from IBS / Andre Oppermann on Sat, May 23, 1998 at 12:44:24AM %2B0200
References:  <3565FFC8.1357A794@pipeline.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, May 23, 1998 at 12:44:24AM +0200, IBS / Andre Oppermann wrote:
> Just FYI (to everyone who was involved in the huge Newsgroup thing some
> weeks ago. Terry - do you remember?):
> 
> I brought the OMDU vs. UMDU issue up on the Qmail mailing list earlier
> this week (with big hints to FreeBSD) and finally Linus got involved
> and he had to admit that EXT2FS is broken.
> 
> In case you want to see the whole thing go to http://www.ornl.gov/its/
> archives/mailing-lists/qmail/1998/05/maillist.html and scroll down
> to 'Large installation using NT clients', read this thread and
> everything down that has to do with 'filesystem reliability', 'async
> metadata', 'ext2fs', 'how incompatibility destroys mail' or 'kernel
> patch'. It's worth a look, Linus makes himself a clown.

What's interesting is that he still doesn't make it easy to write
truly correct code (ie, you have to special-case for Linux).  rename()
is broken - it is not guaranteed to leave either of the files on the
disk.

The result: To get an atomic update of a file "object", you have to do
something like this:
1. Write a backup copy of your new data - "object.newdata-extra"
2. fsync("object.newdata-extra")
3. (if you don't have an fsync that guarantee correct metadata for the
   object, you must sync() here - ie, for Linux until Linus releases
   his patches)
4. Write your new data - "object.newdata"
5. fsync("object.newdata")
6. (repeat point 3)
7. rename("object.newdata", "object")
8. sync()
9. create file "object.is-updated"
10. sync();
11. unlink("object.newdata-extra");
12. sync().
13. unlink("object.is-updated");
14. sync();

On a reboot, you have to handle the following cases:
1. Only "object" exists, or "object" and "object.newdata" exists, or
   "object.is-updated" exists.
1.1 unlink "object.newdata"
1.2 unlink "object.newdata-extra";
1.3 sync()
1.4 unlink "object.is-updated"
1.5 sync()
1.6 you're done
2. "object", "object.newdata-extra", and "object.newdata" exists
2.1 copy "object.newdata-extra" to "object.newdata"
2.2 fsync("object.newdata")
2.3 (repeat point 3)
2.4 rename("object.newdata", "object")
2.5 sync()
2.6 create("object.is-updated")
2.7 sync()
2.8 unlink("object.newdata-extra")
2.9 sync();
2.10 unlink("object.is-updated")
2.11 sync();
2.12 you're done.

I _think_ that's correct.  I'm not certain.  I went through three
iterations to get at something I feel pseudo-confident about.

However, the case with ordered metadata updates is this simple (and
is, if I've understood POSIX correctly, how it require it to be):

1. Create "object.newdata"
2. fsync "object.newdata"
3. rename "object.newdata, "object"

On reboot:
1. "object.newdata" doesn't exist
1.1 You're done.
2. "object.newdata" does exist
2.1 unlink "object.newdata"
2.2. you're done.

Eivind.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980523041534.53692>