Date: Thu, 28 Feb 2013 18:56:47 +0400 From: Lev Serebryakov <lev@FreeBSD.org> To: Ivan Voras <ivoras@freebsd.org> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS! Message-ID: <1502041051.20130228185647@serebryakov.spb.ru> In-Reply-To: <kgnp1n$9mc$1@ger.gmane.org> References: <1796551389.20130228120630@serebryakov.spb.ru> <1238720635.20130228123325@serebryakov.spb.ru> <1158712592.20130228141323@serebryakov.spb.ru> <CAPJF9w=CZg_%2BK7NHTGUhRLaMJWWNOG7zMipGMJL6w6NoNZpSXA@mail.gmail.com> <583012022.20130228143129@serebryakov.spb.ru> <kgnp1n$9mc$1@ger.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Ivan. You wrote 28 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2013 =D0=B3., 18:19= :38: >> Maybe, it is subtile interference between raid5 implementation and >> SU+J, but in such case I want to understand what does raid5 do >> wrong. IV> You guessed correctly, I was going to blame geom_raid5 :) It is not first time :( But every time such discussion ends without any practical results. One time, Kirk say, that delayed writes are Ok for SU until bottom layer doesn't lie about operation completeness. geom_raid5 could delay writes (in hope that next writes will combine nicely and allow not to do read-calculate-write cycle for read alone), but it never mark BIO complete until it is really completed (layers down to geom_raid5 returns completion). So, every BIO in wait queue is "in flight" from GEOM/VFS point of view. Maybe, it is fatal for journal :( And want I really want to see is "SYNC" flag for BIO and that all journal-related writes will be marked with it. Also all commits originated with fsync() MUST be marked in same way, really. Alexander Motin (ahci driver author) assured me, that he'll add support for such flag in driver to flush drive cache too, if it will be introduced. IMHO, lack of this (or similar) flag is bad idea even without geom_raid5 with its optimistic behavior. There was commit r246876, but I don't understand exactly what it means, as no real FS or driver's code was touched. But I'm writing about this idea for 3rd or 4th time without any results :( And I don't mean, that it should be implemented ASAP by someone, I mean I didn't see any support from FS guys (Kirk and somebody else, I don't remember exactly participants of these old thread, but he was not you) like "go ahead and send your patch". All these threads was very defensive from FS guru side, like "we don't need it, fix hardware, disable caches". IV> Is this a production setup you have? Can you afford to destroy it and IV> re-create it for the purpose of testing, this time with geom_raid3 IV> (which should be synchronous with respect to writes)? Unfortunately, it is production setup and I don't have any spare hardware for second one :( I've posted panic stacktrace -- and it is FFS-related too -- and now preparing setup with only one HDD and same high load to try reproduce it without geom_raid5. But I don't have enough hardware (3 spare HDDs at least!) to reproduce it with geom_raid3 or other copy of geiom_radi5. --=20 // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1502041051.20130228185647>