Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jan 2005 11:23:26 +0800
From:      Xin LI <delphij@frontfree.net>
To:        Arne WXrner <arne_woerner@yahoo.com>
Cc:        David Schultz <das@FreeBSD.ORG>
Subject:   Re: ufs+softupdates / consistency
Message-ID:  <1106796206.623.35.camel@spirit>
In-Reply-To: <20050127014250.57722.qmail@web41204.mail.yahoo.com>
References:  <20050127014250.57722.qmail@web41204.mail.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-JixoUVyPnVtgYtd5GEfX
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi, Arne,

=E5=9C=A8 2005-01-26=E4=B8=89=E7=9A=84 17:42 -0800=EF=BC=8CArne WXrner=E5=
=86=99=E9=81=93=EF=BC=9A
[snip]
> Yes, I did. I enabled it in both test settings (KNOPPIX and
> FreeBSD). But I set hw.ata.wc to 0 in my every day setting.
>=20
> I just tried "dd of=3Da if=3D/dev/zero bs=3D32k count=3D1000" on a file
> system with hard disc write cache enabled and disabled and I saw
> no difference (appr. 5 Mbyte/sec, which is about 5 times less than
> a read from that file after the kernel cache was clear by other
> reads).
>=20
> Can somebody explain me, why hw.ata.wc does not change write
> speed?

That's because write caching do little help for sequence writes.  The
goal of write cache is that the driver can re-order writes so it can
reduce the unnecessary movements of heads.  In a sequence write, this is
unnecessary and writing cache make only a little benefit at the
beginning, then it will be flushed again and again as disk writes are
much slower.

However, unfortunately, ATA writing cache does not have "tag" feature
like SCSI devices usually offer.  Instead of giving interrupt when a tag
is committed to disk, ATA disks simply tell the operating system "Yes,
the data is already written" and this causes problem for both
SoftUpdates and Journalling (which REQUIRES that the journal to be
written before any metadata updates it represents).  So it's wise to
turn hw.ata.wc=3D0 if the disk is supposed to store some important data
which is being changed from time to time, and you have turned on
SoftUpdates or Journalling.

> I would prefer hw.ata.wc=3D0, because it would be part of a damage
> avoidance system (I do not use an UPS and I have some write
> accesses over quite long periods of time caused by my TV cards
> (sys/dev/bktr)).
>=20
> Can somebody explain me, why write speed is so much slower than
> read speed (even with hard disc write cache)?

This is common case caused by physical constraints I guess :-)

> I tried an UFS1 file system mounted async for another test. And
> the write speed was still about 5 Mbyte/sec.

Well, I haven't benchmarked it by myself, however, the author of
SoftUpdates claims that the benchmark should be 95% or so as you async
mount a UFS file system.  BTW: I think 5MB/s of write speed is somewhat
too slow for an IDE device, you may want to check the cable, etc.

> Can somebody explain me, what async filesystem I/O is (somehow my
> english is not sufficient to find that out)?

Traditionally, file systems use synchronous writes of metadata in order
to guarantee consistency of metadata.  In order to get best performance,
however, the writes to data should be written in an order that makes
minimal disk head moves.  A asynchronous mounted file system won't
synchronously (N.B. Waits the write to be completed, rather than to
continue and allow subsequent data to be added to the write queue) write
metadata, which makes it possible to write all data in the "best
performance" order.

SoftUpdates and Journaling techniques makes a trade off of the
traditional scenario and asynchronous (as we can see it's not safe if
the system crashes, which can lead to arbitrary inconsistency of your
file system).  SoftUpdates guarantees that the metadata writes are in a
"right" order, say, nothing will be referenced before they gets
initialized, this guarantee means that the file system only have
"recoverable" inconsistency, like leak of space, etc., after a crash.
Journaling means that you write something describes that what metadata
will be written so metadata writes can be asynchronous.  After a system
crash, something that checks the transaction log must be executed and
roll back what is half-committed in order to get the file system clean
again.  Unfortunately, there are many journaling implementations that
does not guarantee the transaction logs to be written before actually
updating meta, rendering journaling useless.

The current FreeBSD SoftUpdates implementation also has a flaw that on
large disks it's still painful to check the file system (even it's
running in the background).  A potential solution is to change the file
system layout to make the "dirty bit" local to allocation groups, which
may finally lead to a new file system.

Cheers,
--=20
Xin LI <delphij delphij net>  http://www.delphij.net/

--=-JixoUVyPnVtgYtd5GEfX
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: 
	=?UTF-8?Q?=E8=BF=99=E6=98=AF=E4=BF=A1=E4=BB=B6=E7=9A=84=E6=95=B0?=
	=?UTF-8?Q?=E5=AD=97=E7=AD=BE=E5=90=8D=E9=83=A8?= =?UTF-8?Q?=E5=88=86?=

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)

iD8DBQBB+F6u/cVsHxFZiIoRAn2hAKCKCfaI61I3CsgW+wNeVgIlnv04TwCfdF6N
D0R7D99yLHFtZ8aK3CElNuI=
=xKp3
-----END PGP SIGNATURE-----

--=-JixoUVyPnVtgYtd5GEfX--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1106796206.623.35.camel>