Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Mar 2002 12:03:19 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Christoph Hellwig <hch@caldera.de>
Cc:        Josh MacDonald <jmacd@CS.Berkeley.EDU>, Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com
Subject:   Re: metadata update durability ordering/soft updates
Message-ID:  <3C925387.2DC4F2C0@mindspring.com>
References:  <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <20020315193844.A26441@caldera.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Christoph Hellwig wrote:
> On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote:
> > > - The file system has never made any guarantees.
> >
> > Yes it has.  If you look at the atime/mtime/ctime update
> > requirements for the OS, they are pretty blatant.  THey
> > just aren't enough to be able to blindly use them.
> 
> These requirements are only there for O_SYNC.

POSIX 1003.1, clauses 2.3.5 and 5.6.6.2 distinguish between
"SHALL be marked for update" and "SHALL be updated" with
regard to the ctime, mtime, and atime values for a file,
which are FS metadata.  See also 5.5.3.2.  The relevent
phrases are:

	2.3.5 [ ... ] All fields that are marked for update
        SHALL be updated when the file is no longer open by
	any process, or when a stat() or fstat() is performed
	on the file.  Other times at which updates are done
	are unspecified.

	5.6.6.2	[ ... ] The utime() function sets the access
	and modification times of the named file.

	5.5.3.2 [ ... ]	Upon successful completion, the
	rename() function SHALL mark for update the st_ctime
	and st_mtime fields of the parent directory of each
	file.

The getdirentries update semantics (SHALL update) and the metadata
modifications (SHALL update) are pretty unambiguous, as well.

The Single UNIX Specification has similar controls on the marking
for update in write, mmap, and other cases.  The POSIX requirements
are stiffer because of VMS, where directories were not implemented
as files.  I used to dislike it, but way back then, I was just
starting out as a student, and didn't realize the transactional
implications.  The single UNIX specification also fails to specify
things like the underlying system call(s) used to implement
directory traversal.  POSIX, however specifies that the atime
"SHALL be updated" (as opposed to merely marked for update). We got
around this requirement one project I was on by not using the
behaviour specified system call interface to read the directory
contents, and declaring that directories were not regular files
for the FS in question.


> > > - You can use fsync() to stabilize a single file and its metadata
> > > dependencies.
> >
> > Metadata stabilization should be automatic.  What an fsync
> > there does is really enforce ordering on metadata writes,
> > by acting as a barrier.
> 
> Why do you think there is fdatasync() (and O_DSYNC)?

Linux?  It used to be called "O_WRITESYNC" back in the mid
1980's.  The idea that an FS would not order your metadata
for you, yet you would still have integrity requirements in
such an environment, was simply unthinkable.

The O_DSYNC came about because people invented the concept
of unsynchronized metadata, which led to the ide that it
should be possible to seperately cause data and metadata
synchronization.

IMO, there's really no excuse for unsynchornized metadata,
and synchronous data writes exist only to avoid the system
call overhead of seperately calling fsync(), and the OS
overhead of having to synchronize all dirty pages instead
of a region, based on the descriptor being used for the
operation.

You can make the same argument in FreeBSD actually: msync()
doesn't limit itself to the range specified for the backing
object, because it can't tell (there are no reverse maps);
last time I looked at msync() in Linux and Solaris, it was
true those places, too.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C925387.2DC4F2C0>