Date: Wed, 6 Mar 2013 12:41:39 +0400 From: Lev Serebryakov <lev@FreeBSD.org> To: Don Lewis <truckman@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org, ivoras@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS! Message-ID: <1198028260.20130306124139@serebryakov.spb.ru> In-Reply-To: <201303060815.r268FIl5015220@gw.catspoiler.org> References: <1644513757.20130306113250@serebryakov.spb.ru> <201303060815.r268FIl5015220@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Don. You wrote 6 =DC=D0=E0=E2=D0 2013 =D3., 12:15:18: >> This scenario looks plausible, but it raises another question: does >> barriers will protect against it? It doesn't look so, as now barrier >> write is issued only when new inode BLOCK is allocated. And it leads >> us to my other question: why did not mark such vital writes with >> flag, which will force driver to mark them as "uncacheable" (And same >> for fsync()-inducted writes). Again, not BIO_FLUSH, which should >> flush whole cache, but flag for BIO. I was told my mav@ (ahci driver >> author) that ATA has such capability. And I'm sure, that SCSI/SAS drives >> should have one too. DL> In the existing implementation, barriers wouldn't help since they aren't DL> used in enough nearly enough places. UFS+SU currently expects the drive DL> to tell it when the data actually hits the platter so that it can DL> control the write ordering. In theory, barriers could be used instead, DL> but performance would be terrible if they got turned into cache flushes. Yep! So, we need stream (file/vnode/inode)-related barriers or simple per-request (bp/bio) flag added. DL> With NCQ or TCQ, the drive can have a sizeable number of writes DL> internally queued and it is free to reorder them as it pleases even with DL> write caching disabled, but if write caching is disabled it has to delay DL> the notification of their completion until the data is on the platters DL> so that UFS+SU can enforce the proper dependency ordering. But, again, performance would be terrible :( I've checked it. On very sparse multi-threaded patterns (multiple torrents download on fast channel in my simple home case, and, I think, things could be worse in case of big file server in organization) and "simple" SATA drives it significant worse in my experience :( DL> I don't know enough about ATA to say if it supports marking individual DL> writes as uncacheable. To support consistency on a drive with write DL> caching enabled, UFS+SU would have to mark many of its writes as DL> uncacheable. Even if this works, calls to fsync() would have to be I don't see this as a big problem. I've done some experiments about one and half year ago by adding counter all overs UFS/FFS code when it writes metadata, and it was about 1% of writes on busy file system (torrents, csup update, buildworld, all on one big FS). DL> turned into cache flushes to force the file data (assuming that it was DL> was written with a cacheable write) to be written to the platters and DL> only return to the userland program after the data is written. If drive DL> write caching is off, then UFS+SU keeps track of the outstanding writes DL> and an fsync() call won't return until the drive notifies UFS+SU that DL> the data blocks for that file are actually written. In this case, the DL> fsync() call doesn't need to get propagated down to the drive. I see. But then we should turn off disc cache by default. And write some whitepaper about this situation. I don't know what is better for commodity SATA drives, really. And I'm not sure, that I understand UFS/FFS code good enough to do proper experiment by adding such flag to whole our storage stack :( And second problem: SSD. I know nothing about their caching strategies, and SSDs has very big RAM buffers compared to commodity HDDs (something like 512MiB vs 64MiB). --=20 // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1198028260.20130306124139>