From owner-freebsd-current@FreeBSD.ORG Mon Oct 28 20:55:37 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 82E127B9 for ; Mon, 28 Oct 2013 20:55:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 400152D6B for ; Mon, 28 Oct 2013 20:55:37 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1VatsU-000O8v-Gf; Tue, 29 Oct 2013 00:57:34 +0400 Date: Tue, 29 Oct 2013 00:57:34 +0400 From: Slawa Olhovchenkov To: Allan Jude Subject: Re: ZFS txg implementation flaw Message-ID: <20131028205734.GW63359@zxy.spb.ru> References: <20131028092844.GA24997@zxy.spb.ru> <0F1D571E-2806-4392-A5EC-BE66A3C92BF7@gmail.com> <20131028181631.GV63359@zxy.spb.ru> <526EACB4.9030906@allanjude.com> <20131028204832.GA66537@zxy.spb.ru> <526ECE36.8010607@allanjude.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <526ECE36.8010607@allanjude.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Oct 2013 20:55:37 -0000 On Mon, Oct 28, 2013 at 04:51:02PM -0400, Allan Jude wrote: > On 2013-10-28 16:48, Slawa Olhovchenkov wrote: > > On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote: > > > >> On 2013-10-28 14:16, Slawa Olhovchenkov wrote: > >>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > >>> > >>>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > >>>> > >>>>> I can be wrong. > >>>>> As I see ZFS cretate seperate thread for earch txg writing. > >>>>> Also for writing to L2ARC. > >>>>> As result -- up to several thousands threads created and destoyed per > >>>>> second. And hundreds thousands page allocations, zeroing, maping > >>>>> unmaping and freeing per seconds. Very high overhead. > >>>>> > >>>>> In systat -vmstat I see totfr up to 600000, prcfr up to 200000. > >>>>> > >>>>> Estimated overhead -- 30% of system time. > >>>>> > >>>>> Can anybody implement thread and page pool for txg? > >>>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? > >>> vfs.zfs.txg.timeout: 5 > >>> > >>> Only x5 lowering (less in real case with burst writing). And more fragmentation on writing and etc. > >>> _______________________________________________ > >>> freebsd-current@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >> >From my understanding, increasing the timeout so you are doing fewer > >> transaction groups, would actually be the way to increase performance, > >> at the cost of 'bursty' writing and the associated uneven latency. > > This (increasing the timeout) is dramaticaly decreasing read > > performance by very high IO burst. > It shouldn't affect read performance, except during the flush operations > (every txg.timeout seconds) Yes, I talk about this time. > If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle > > reading quickly, then every txg.timeout seconds (and for maybe longer), > it flushes the entire transaction group (may be 100s of MBs) to the > disk, this high write load may make reads slow until it is finished. Yes. And read may delayed for some seconds. This is unacceptable for may case.