Date: Fri, 15 Mar 2002 12:03:19 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Christoph Hellwig <hch@caldera.de> Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>, Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates Message-ID: <3C925387.2DC4F2C0@mindspring.com> References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <20020315193844.A26441@caldera.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Christoph Hellwig wrote: > On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote: > > > - The file system has never made any guarantees. > > > > Yes it has. If you look at the atime/mtime/ctime update > > requirements for the OS, they are pretty blatant. THey > > just aren't enough to be able to blindly use them. > > These requirements are only there for O_SYNC. POSIX 1003.1, clauses 2.3.5 and 5.6.6.2 distinguish between "SHALL be marked for update" and "SHALL be updated" with regard to the ctime, mtime, and atime values for a file, which are FS metadata. See also 5.5.3.2. The relevent phrases are: 2.3.5 [ ... ] All fields that are marked for update SHALL be updated when the file is no longer open by any process, or when a stat() or fstat() is performed on the file. Other times at which updates are done are unspecified. 5.6.6.2 [ ... ] The utime() function sets the access and modification times of the named file. 5.5.3.2 [ ... ] Upon successful completion, the rename() function SHALL mark for update the st_ctime and st_mtime fields of the parent directory of each file. The getdirentries update semantics (SHALL update) and the metadata modifications (SHALL update) are pretty unambiguous, as well. The Single UNIX Specification has similar controls on the marking for update in write, mmap, and other cases. The POSIX requirements are stiffer because of VMS, where directories were not implemented as files. I used to dislike it, but way back then, I was just starting out as a student, and didn't realize the transactional implications. The single UNIX specification also fails to specify things like the underlying system call(s) used to implement directory traversal. POSIX, however specifies that the atime "SHALL be updated" (as opposed to merely marked for update). We got around this requirement one project I was on by not using the behaviour specified system call interface to read the directory contents, and declaring that directories were not regular files for the FS in question. > > > - You can use fsync() to stabilize a single file and its metadata > > > dependencies. > > > > Metadata stabilization should be automatic. What an fsync > > there does is really enforce ordering on metadata writes, > > by acting as a barrier. > > Why do you think there is fdatasync() (and O_DSYNC)? Linux? It used to be called "O_WRITESYNC" back in the mid 1980's. The idea that an FS would not order your metadata for you, yet you would still have integrity requirements in such an environment, was simply unthinkable. The O_DSYNC came about because people invented the concept of unsynchronized metadata, which led to the ide that it should be possible to seperately cause data and metadata synchronization. IMO, there's really no excuse for unsynchornized metadata, and synchronous data writes exist only to avoid the system call overhead of seperately calling fsync(), and the OS overhead of having to synchronize all dirty pages instead of a region, based on the descriptor being used for the operation. You can make the same argument in FreeBSD actually: msync() doesn't limit itself to the range specified for the backing object, because it can't tell (there are no reverse maps); last time I looked at msync() in Linux and Solaris, it was true those places, too. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C925387.2DC4F2C0>