From owner-freebsd-chat Fri May 22 19:15:49 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id TAA14177 for freebsd-chat-outgoing; Fri, 22 May 1998 19:15:49 -0700 (PDT) (envelope-from owner-freebsd-chat@FreeBSD.ORG) Received: from ns1.yes.no (ns1.yes.no [195.119.24.10]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA14164 for ; Fri, 22 May 1998 19:15:43 -0700 (PDT) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.8.7/8.8.7) with ESMTP id CAA08803; Sat, 23 May 1998 02:15:40 GMT Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id EAA08063; Sat, 23 May 1998 04:15:34 +0200 (MET DST) Message-ID: <19980523041534.53692@follo.net> Date: Sat, 23 May 1998 04:15:34 +0200 From: Eivind Eklund To: IBS / Andre Oppermann , freebsd-chat@FreeBSD.ORG Subject: Re: Linus finally got it (filesystem issue) References: <3565FFC8.1357A794@pipeline.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1i In-Reply-To: <3565FFC8.1357A794@pipeline.ch>; from IBS / Andre Oppermann on Sat, May 23, 1998 at 12:44:24AM +0200 Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, May 23, 1998 at 12:44:24AM +0200, IBS / Andre Oppermann wrote: > Just FYI (to everyone who was involved in the huge Newsgroup thing some > weeks ago. Terry - do you remember?): > > I brought the OMDU vs. UMDU issue up on the Qmail mailing list earlier > this week (with big hints to FreeBSD) and finally Linus got involved > and he had to admit that EXT2FS is broken. > > In case you want to see the whole thing go to http://www.ornl.gov/its/ > archives/mailing-lists/qmail/1998/05/maillist.html and scroll down > to 'Large installation using NT clients', read this thread and > everything down that has to do with 'filesystem reliability', 'async > metadata', 'ext2fs', 'how incompatibility destroys mail' or 'kernel > patch'. It's worth a look, Linus makes himself a clown. What's interesting is that he still doesn't make it easy to write truly correct code (ie, you have to special-case for Linux). rename() is broken - it is not guaranteed to leave either of the files on the disk. The result: To get an atomic update of a file "object", you have to do something like this: 1. Write a backup copy of your new data - "object.newdata-extra" 2. fsync("object.newdata-extra") 3. (if you don't have an fsync that guarantee correct metadata for the object, you must sync() here - ie, for Linux until Linus releases his patches) 4. Write your new data - "object.newdata" 5. fsync("object.newdata") 6. (repeat point 3) 7. rename("object.newdata", "object") 8. sync() 9. create file "object.is-updated" 10. sync(); 11. unlink("object.newdata-extra"); 12. sync(). 13. unlink("object.is-updated"); 14. sync(); On a reboot, you have to handle the following cases: 1. Only "object" exists, or "object" and "object.newdata" exists, or "object.is-updated" exists. 1.1 unlink "object.newdata" 1.2 unlink "object.newdata-extra"; 1.3 sync() 1.4 unlink "object.is-updated" 1.5 sync() 1.6 you're done 2. "object", "object.newdata-extra", and "object.newdata" exists 2.1 copy "object.newdata-extra" to "object.newdata" 2.2 fsync("object.newdata") 2.3 (repeat point 3) 2.4 rename("object.newdata", "object") 2.5 sync() 2.6 create("object.is-updated") 2.7 sync() 2.8 unlink("object.newdata-extra") 2.9 sync(); 2.10 unlink("object.is-updated") 2.11 sync(); 2.12 you're done. I _think_ that's correct. I'm not certain. I went through three iterations to get at something I feel pseudo-confident about. However, the case with ordered metadata updates is this simple (and is, if I've understood POSIX correctly, how it require it to be): 1. Create "object.newdata" 2. fsync "object.newdata" 3. rename "object.newdata, "object" On reboot: 1. "object.newdata" doesn't exist 1.1 You're done. 2. "object.newdata" does exist 2.1 unlink "object.newdata" 2.2. you're done. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message