Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Mar 2002 11:00:52 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Parity Error <bootup@mail.ru>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: metadata update durability ordering/soft updates
Message-ID:  <3C8FA1E4.A89F52FF@mindspring.com>
References:  <E16l7YD-0001FG-00@f9.mail.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Parity Error wrote:
> with soft-updates metadata updates are delayed write. I am
> wondering if, say there are two independent structural changes,
> one after another, and then a crash happens.
> 
> Is there a possibility that the latter structural change got
> written to disk before the former due to some memory replacement
> policy ?

Independent writes are independent, by definition.  They
are permitted to occur in either order.  Metadata updates
are only ordered by soft updates insofar as necessary to
satify dependencies.  Thus indepependent writes can occur
in any order, but will *usually* occur in order, due to
the way that a scheduled write can not be reordered once it
is given to the disk controller.

This is due to a locking issue on the disk operations queue
in the driver, and is arguably a bug.  It's likely that some
work currently in progress will forceed to the point that the
"likely ordering" of independent operations will "go away in
the future, so you can't even safely depend on it being likely.

This is normally an issue only for updates that do things
like update both an index and a record file, and imply a
dependency order in the operation.  In other words, there
is implied metadata between the two files, and therefore an
implied dependency.

It's the application's responsibility to signal the dependency
to the OS, so that the updates are ordered.  The normal way to
do this is to use a two stage commit operation (per standard
database theoury, Circa IBM, 1965).  In UNIX this is done by
requesting that the first operation be committed, before making
the request to begin the second operation (e.g. a software
barrier instruction).  To find out more about this, you should
use "man fsync" and "man open" (in the "open" page, look for
"O_FSYNC").


As to misordering of dependent writes, even if you use
synchronous I/O properly...

Yes, this can happen due to the memory replacement policy
on many IDE hard drives, which lie about data having been
committed to stable storage, when in fact it has only been
written to the disk write cache, which is far from stable
storage, being as it's not battery backed, and it is not
guaranteed to be written to the disk after a power failure,
except on some IBM and Quantum drives which are no longer
manufactured.

You can ensure this doesn't happen to you by using only
disks which can correctly support cache flush primitives
and tagged command queues, or disabling write caching on
the device.  SCSI devices don't have this problem.

Another potential problem is that some IDE  disks will
acknowledge disabling write caching, but will in fact not
disable it, no matter what commands you spit at them.  For
some of these disks, there are firmware updates available,
but if you are unlucky enough to own one of these disks,
then there is usually no option but to buy a good disk
instead.  May I recommend SCSI?


> could this affect the correctness of some applications ?

The disk caching issue could.  The implied metadata could
not.

If you have an application that uses implied metadata, but
does not take the necessary steps for UNIX to ensure that
the OS is signalled about the implied ordering dependency,
then by definition, your application can't have it's
correctness effected... since it has no correctness to lose.

8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C8FA1E4.A89F52FF>