Date: Thu, 27 Jan 2005 11:23:26 +0800 From: Xin LI <delphij@frontfree.net> To: Arne WXrner <arne_woerner@yahoo.com> Cc: David Schultz <das@FreeBSD.ORG> Subject: Re: ufs+softupdates / consistency Message-ID: <1106796206.623.35.camel@spirit> In-Reply-To: <20050127014250.57722.qmail@web41204.mail.yahoo.com> References: <20050127014250.57722.qmail@web41204.mail.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-JixoUVyPnVtgYtd5GEfX Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Arne, =E5=9C=A8 2005-01-26=E4=B8=89=E7=9A=84 17:42 -0800=EF=BC=8CArne WXrner=E5= =86=99=E9=81=93=EF=BC=9A [snip] > Yes, I did. I enabled it in both test settings (KNOPPIX and > FreeBSD). But I set hw.ata.wc to 0 in my every day setting. >=20 > I just tried "dd of=3Da if=3D/dev/zero bs=3D32k count=3D1000" on a file > system with hard disc write cache enabled and disabled and I saw > no difference (appr. 5 Mbyte/sec, which is about 5 times less than > a read from that file after the kernel cache was clear by other > reads). >=20 > Can somebody explain me, why hw.ata.wc does not change write > speed? That's because write caching do little help for sequence writes. The goal of write cache is that the driver can re-order writes so it can reduce the unnecessary movements of heads. In a sequence write, this is unnecessary and writing cache make only a little benefit at the beginning, then it will be flushed again and again as disk writes are much slower. However, unfortunately, ATA writing cache does not have "tag" feature like SCSI devices usually offer. Instead of giving interrupt when a tag is committed to disk, ATA disks simply tell the operating system "Yes, the data is already written" and this causes problem for both SoftUpdates and Journalling (which REQUIRES that the journal to be written before any metadata updates it represents). So it's wise to turn hw.ata.wc=3D0 if the disk is supposed to store some important data which is being changed from time to time, and you have turned on SoftUpdates or Journalling. > I would prefer hw.ata.wc=3D0, because it would be part of a damage > avoidance system (I do not use an UPS and I have some write > accesses over quite long periods of time caused by my TV cards > (sys/dev/bktr)). >=20 > Can somebody explain me, why write speed is so much slower than > read speed (even with hard disc write cache)? This is common case caused by physical constraints I guess :-) > I tried an UFS1 file system mounted async for another test. And > the write speed was still about 5 Mbyte/sec. Well, I haven't benchmarked it by myself, however, the author of SoftUpdates claims that the benchmark should be 95% or so as you async mount a UFS file system. BTW: I think 5MB/s of write speed is somewhat too slow for an IDE device, you may want to check the cable, etc. > Can somebody explain me, what async filesystem I/O is (somehow my > english is not sufficient to find that out)? Traditionally, file systems use synchronous writes of metadata in order to guarantee consistency of metadata. In order to get best performance, however, the writes to data should be written in an order that makes minimal disk head moves. A asynchronous mounted file system won't synchronously (N.B. Waits the write to be completed, rather than to continue and allow subsequent data to be added to the write queue) write metadata, which makes it possible to write all data in the "best performance" order. SoftUpdates and Journaling techniques makes a trade off of the traditional scenario and asynchronous (as we can see it's not safe if the system crashes, which can lead to arbitrary inconsistency of your file system). SoftUpdates guarantees that the metadata writes are in a "right" order, say, nothing will be referenced before they gets initialized, this guarantee means that the file system only have "recoverable" inconsistency, like leak of space, etc., after a crash. Journaling means that you write something describes that what metadata will be written so metadata writes can be asynchronous. After a system crash, something that checks the transaction log must be executed and roll back what is half-committed in order to get the file system clean again. Unfortunately, there are many journaling implementations that does not guarantee the transaction logs to be written before actually updating meta, rendering journaling useless. The current FreeBSD SoftUpdates implementation also has a flaw that on large disks it's still painful to check the file system (even it's running in the background). A potential solution is to change the file system layout to make the "dirty bit" local to allocation groups, which may finally lead to a new file system. Cheers, --=20 Xin LI <delphij delphij net> http://www.delphij.net/ --=-JixoUVyPnVtgYtd5GEfX Content-Type: application/pgp-signature; name=signature.asc Content-Description: =?UTF-8?Q?=E8=BF=99=E6=98=AF=E4=BF=A1=E4=BB=B6=E7=9A=84=E6=95=B0?= =?UTF-8?Q?=E5=AD=97=E7=AD=BE=E5=90=8D=E9=83=A8?= =?UTF-8?Q?=E5=88=86?= -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQBB+F6u/cVsHxFZiIoRAn2hAKCKCfaI61I3CsgW+wNeVgIlnv04TwCfdF6N D0R7D99yLHFtZ8aK3CElNuI= =xKp3 -----END PGP SIGNATURE----- --=-JixoUVyPnVtgYtd5GEfX--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1106796206.623.35.camel>