Date: Thu, 24 Nov 2011 02:21:45 +0400 From: Lev Serebryakov <lev@FreeBSD.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org, freebsd-geom@FreeBSD.org Subject: Re: Does UFS2 send BIO_FLUSH to GEOM when update metadata (with softupdates)? Message-ID: <337241442.20111124022145@serebryakov.spb.ru> In-Reply-To: <20111123194444.GE50300@deviant.kiev.zoral.com.ua> References: <1957615267.20111123230026@serebryakov.spb.ru> <20111123194444.GE50300@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Kostik. You wrote 23 =ED=EE=FF=E1=F0=FF 2011 =E3., 23:44:44: > You are making wrong conclusions from the false assumptions. It seems to me, that FFS2/SU and SUJ are built on wrong assumption, that complete bwrite() without ASYNC flag really means landed data on physical device in any case. It is completely wrong, non-reliable, and prevents from building reliable AND high-performance storage on FreeBSD :( Or, may be, I understand code wrong. It is possible. See below. > The only requirement of the SU is that writes reported as done by disk > driver are indeed safely landed in the involatile storage. I've traced code and found next call chain. Please, correct me if I'm wrong. Softupdates writes all data via bwrite() or bawrite(). bawrite() is Ok, it is Async, it should givew any guarantees about immideat cache flush or ordering. bwrite() calls (in most cases, and in case of GEOM backing) ends in strategy, g_vfs_strategy() in case of GEOM (geom uses generic bufwrite(), which tweaks some flags, does some checks and send struct buf to bop_strategy, which is g_vfs_stratedy() in case of GEOM). g_vfs_strategy() sends request WITHOUT looking into ASYNC flag on "struct buf". We have BIO_ORDERED flags, but it is not used on this codepat= h! Maybe, cheap solution will be to set BIO_ORDERED on every struct bio, which is created for struct buf without ASYNC flag? Or it is too strict? Please note: now GEOM could not guarantee even ordering of SU writing requests now! Disk drivers, which sends such requests to hardware at least, could queue them too or leave them in drive's cache. It is COMPLETELY WRONG! With such disconnection between top-level logic (softupdates) and all driver stack (GEOM and disk drives) I surprised, that FFS2 could be repaired after panic at all! IMHO, it should be fixed ASAP and FFS2 should notify lower layers about writes, which is required to be ordered & landed before bwrite() returns! We have BIO_ORDERED flag, it could be used, but if it is too strict, we could add BIO_SYNC flag, too. ATA/SCSI subsystems already have proper support for BIO_ORDERED, and adding BIO_SYNC will not a big deal on low level, also, it could be easily added to g_vfs_strategy(), but I'm not sure that it will not hurt performance too much -- I'm not sure, that every buf write without ASYNC flag should be strict-SYNC. But I AM SURE, that SU/SU+J writes MUST BE DONE STRICT SYNC. P.S. I added geom@ into CC: as it seems to be UFS<->GEOM interaction problem. --=20 // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?337241442.20111124022145>