Date: Fri, 15 Mar 2002 12:03:19 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Christoph Hellwig <hch@caldera.de> Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>, Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates Message-ID: <3C925387.2DC4F2C0@mindspring.com> References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <20020315193844.A26441@caldera.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Christoph Hellwig wrote:
> On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote:
> > > - The file system has never made any guarantees.
> >
> > Yes it has. If you look at the atime/mtime/ctime update
> > requirements for the OS, they are pretty blatant. THey
> > just aren't enough to be able to blindly use them.
>
> These requirements are only there for O_SYNC.
POSIX 1003.1, clauses 2.3.5 and 5.6.6.2 distinguish between
"SHALL be marked for update" and "SHALL be updated" with
regard to the ctime, mtime, and atime values for a file,
which are FS metadata. See also 5.5.3.2. The relevent
phrases are:
2.3.5 [ ... ] All fields that are marked for update
SHALL be updated when the file is no longer open by
any process, or when a stat() or fstat() is performed
on the file. Other times at which updates are done
are unspecified.
5.6.6.2 [ ... ] The utime() function sets the access
and modification times of the named file.
5.5.3.2 [ ... ] Upon successful completion, the
rename() function SHALL mark for update the st_ctime
and st_mtime fields of the parent directory of each
file.
The getdirentries update semantics (SHALL update) and the metadata
modifications (SHALL update) are pretty unambiguous, as well.
The Single UNIX Specification has similar controls on the marking
for update in write, mmap, and other cases. The POSIX requirements
are stiffer because of VMS, where directories were not implemented
as files. I used to dislike it, but way back then, I was just
starting out as a student, and didn't realize the transactional
implications. The single UNIX specification also fails to specify
things like the underlying system call(s) used to implement
directory traversal. POSIX, however specifies that the atime
"SHALL be updated" (as opposed to merely marked for update). We got
around this requirement one project I was on by not using the
behaviour specified system call interface to read the directory
contents, and declaring that directories were not regular files
for the FS in question.
> > > - You can use fsync() to stabilize a single file and its metadata
> > > dependencies.
> >
> > Metadata stabilization should be automatic. What an fsync
> > there does is really enforce ordering on metadata writes,
> > by acting as a barrier.
>
> Why do you think there is fdatasync() (and O_DSYNC)?
Linux? It used to be called "O_WRITESYNC" back in the mid
1980's. The idea that an FS would not order your metadata
for you, yet you would still have integrity requirements in
such an environment, was simply unthinkable.
The O_DSYNC came about because people invented the concept
of unsynchronized metadata, which led to the ide that it
should be possible to seperately cause data and metadata
synchronization.
IMO, there's really no excuse for unsynchornized metadata,
and synchronous data writes exist only to avoid the system
call overhead of seperately calling fsync(), and the OS
overhead of having to synchronize all dirty pages instead
of a region, based on the descriptor being used for the
operation.
You can make the same argument in FreeBSD actually: msync()
doesn't limit itself to the range specified for the backing
object, because it can't tell (there are no reverse maps);
last time I looked at msync() in Linux and Solaris, it was
true those places, too.
-- Terry
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C925387.2DC4F2C0>
