Date: Sat, 26 Nov 2011 12:04:48 +0400 From: Lev Serebryakov <lev@freebsd.org> To: Kirk McKusick <mckusick@mckusick.com> Cc: freebsd-fs@freebsd.org Subject: Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)? Message-ID: <22120688.20111126120448@serebryakov.spb.ru> In-Reply-To: <201111260725.pAQ7PDow056289@chez.mckusick.com> References: <20111123194444.GE50300@deviant.kiev.zoral.com.ua> <201111260725.pAQ7PDow056289@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Kirk. You wrote 26 =ED=EE=FF=E1=F0=FF 2011 =E3., 11:25:13: > You are entirely correct when you say that the requirement for > SU and SU+J is that it requires that notification of a disk-write > complete mean that the data is on the disk (stable). The problem > that arises is that (apparently) some tag-queue implementations > report back that tags have been written when in fact they have > not been written. Or any GEOM implements write cache. Please, don't forget, that now FS doesn't ask disk driver to write block, it asks GEOM stack, which could be composed from several nodes, located on several physically independent computers (don't forget about geom_gate, iSCSI, etc). Or any hardware implements big write cache, too. Every HDD or controller will report not-queued write as complete after copying data into cache (if WC is enabled). And even if cache is baked by battery, nobody promise to flush it in proper (from SU point of view) order. And even worse, if cache is not battery-backed (but server itself IS), or its flush depends on drivers (GEOM case), and here is system crash. > I believe that they only way to ensure that a tagged request is > on stable store is to send a BIO_BARRIER request to the disk. The > BIO_BARRIER request is not supposed to return until all I/O > requests that were sent down prior to the BIO_BARRIER have been > committed to stable store. IMHO, idea with per-request flag, which driver will translate into appropriate device flags (may be, in barrier, but maybe not -- depends on device capabilities) is much better. BIO_BARRIER will flush ALL write cache by design. It is barrier, and it hasn't any references to previous requests, it is flush-them-all request. It could be HUGE performance impact, if you will flush large write cache of controller every 100ms. But if SU/fsync()/O_SYNC requests will be marked with special flag, GEOM stack and controller will be able to process these requests separately on one hand, and will not flush cache on timer basis, on other, if it is possible. Maybe, on some hardware, it will have same effect as barrier, but I'm sure, that there IS hardware, which could handle such requests much more effectively, that full cache flush. And, yes, GEOM too. Again, I, as maintainer of geom_raid5, know how vital to have good cache in this module (with some requests reside in it for tens of secinds!), and I don't see any way to implement barrier, but flush cache on each barrier -- which effectively disable cache at all. --=20 // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?22120688.20111126120448>