Date: Thu, 24 Nov 2011 00:04:14 +0400 From: Lev Serebryakov <lev@freebsd.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)? Message-ID: <1391930411.20111124000414@serebryakov.spb.ru> In-Reply-To: <20111123194444.GE50300@deviant.kiev.zoral.com.ua> References: <1957615267.20111123230026@serebryakov.spb.ru> <20111123194444.GE50300@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Kostik. You wrote 23 =ED=EE=FF=E1=F0=FF 2011 =E3., 23:44:44: >> Does UFS2 with softupdates (without journal) issues BIO_FLUSH to >> GEOM layer when it need to ensure consistency on on-disk metadata? > No. Softupdates do not need flushes. It need flushes. Because WITHOUT flashes on modern storage architectures there is no way to be sure, that (I'm quoting your last sentence) "writes reported as done by disk driver are indeed safely landed in the involatile storage." It is sad, but it is true. Disk controllers have caches, disks have caches. In virtual environment and with NAS (iSCSIS/FC/Whatever) everything is even worse. And every layer LAYS about "landing", it was shown, for example, by Brad Fitzpatrick many years ago (http://brad.livejou= rnal.com/2116715.html). If SU don't mark its writes in special way as strictly-synchronous, SU could not be sure, that data is really LANDED when bio is marked as complete one. As far as I understand, there is no such way to mark bio with BIO_WRITE command as such special case, and only way to ensure landing is to call BIO_FLUSH after BIO_WRITE. > You are making wrong conclusions from the false assumptions. > The only requirement of the SU is that writes reported as done by disk > driver are indeed safely landed in the involatile storage. See above. Only BIO_FLUSH could give some (but, again, not 100%, but "best effort") guarantee, that completed BIO_WRITE is really landed. Data could be queued on many layers, and without explicit FLUSH it could not be really written for seconds or even minutes (but reported as so). For example, for RAID5 descent performance it is vital to have some write cache. And when it is software implementation, UPS could not help from system panics. So, it is very sad, that SU and SU+J don't epress their requirements via code! It seems, that even SU+J will not help from crashes in case when some GEOM does write caching. --=20 // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1391930411.20111124000414>