From owner-freebsd-current@FreeBSD.ORG Thu Feb 28 14:57:02 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 74FFFB0E; Thu, 28 Feb 2013 14:57:02 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 1ABF19A1; Thu, 28 Feb 2013 14:57:02 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:9421:367:9d7d:512b]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 858794AC57; Thu, 28 Feb 2013 18:56:53 +0400 (MSK) Date: Thu, 28 Feb 2013 18:56:47 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1502041051.20130228185647@serebryakov.spb.ru> To: Ivan Voras Subject: Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS! In-Reply-To: References: <1796551389.20130228120630@serebryakov.spb.ru> <1238720635.20130228123325@serebryakov.spb.ru> <1158712592.20130228141323@serebryakov.spb.ru> <583012022.20130228143129@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Feb 2013 14:57:02 -0000 Hello, Ivan. You wrote 28 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2013 =D0=B3., 18:19= :38: >> Maybe, it is subtile interference between raid5 implementation and >> SU+J, but in such case I want to understand what does raid5 do >> wrong. IV> You guessed correctly, I was going to blame geom_raid5 :) It is not first time :( But every time such discussion ends without any practical results. One time, Kirk say, that delayed writes are Ok for SU until bottom layer doesn't lie about operation completeness. geom_raid5 could delay writes (in hope that next writes will combine nicely and allow not to do read-calculate-write cycle for read alone), but it never mark BIO complete until it is really completed (layers down to geom_raid5 returns completion). So, every BIO in wait queue is "in flight" from GEOM/VFS point of view. Maybe, it is fatal for journal :( And want I really want to see is "SYNC" flag for BIO and that all journal-related writes will be marked with it. Also all commits originated with fsync() MUST be marked in same way, really. Alexander Motin (ahci driver author) assured me, that he'll add support for such flag in driver to flush drive cache too, if it will be introduced. IMHO, lack of this (or similar) flag is bad idea even without geom_raid5 with its optimistic behavior. There was commit r246876, but I don't understand exactly what it means, as no real FS or driver's code was touched. But I'm writing about this idea for 3rd or 4th time without any results :( And I don't mean, that it should be implemented ASAP by someone, I mean I didn't see any support from FS guys (Kirk and somebody else, I don't remember exactly participants of these old thread, but he was not you) like "go ahead and send your patch". All these threads was very defensive from FS guru side, like "we don't need it, fix hardware, disable caches". IV> Is this a production setup you have? Can you afford to destroy it and IV> re-create it for the purpose of testing, this time with geom_raid3 IV> (which should be synchronous with respect to writes)? Unfortunately, it is production setup and I don't have any spare hardware for second one :( I've posted panic stacktrace -- and it is FFS-related too -- and now preparing setup with only one HDD and same high load to try reproduce it without geom_raid5. But I don't have enough hardware (3 spare HDDs at least!) to reproduce it with geom_raid3 or other copy of geiom_radi5. --=20 // Black Lion AKA Lev Serebryakov