From owner-freebsd-geom@FreeBSD.ORG Wed Mar 6 12:53:44 2013 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7DA67461; Wed, 6 Mar 2013 12:53:44 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 08C7D918; Wed, 6 Mar 2013 12:53:43 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:9421:367:9d7d:512b]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 12C844AC57; Wed, 6 Mar 2013 16:53:39 +0400 (MSK) Date: Wed, 6 Mar 2013 16:53:37 +0400 From: Lev Serebryakov Organization: FreeBSD Project X-Priority: 3 (Normal) Message-ID: <1402477662.20130306165337@serebryakov.spb.ru> To: Don Lewis Subject: Re: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS! In-Reply-To: <201303061001.r26A18n3015414@gw.catspoiler.org> References: <1198028260.20130306124139@serebryakov.spb.ru> <201303061001.r26A18n3015414@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@FreeBSD.org, ivoras@FreeBSD.org, freebsd-geom@FreeBSD.org X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 12:53:44 -0000 Hello, Don. You wrote 6 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 14:01:08: >> DL> With NCQ or TCQ, the drive can have a sizeable number of writes >> DL> internally queued and it is free to reorder them as it pleases even = with >> DL> write caching disabled, but if write caching is disabled it has to d= elay >> DL> the notification of their completion until the data is on the platte= rs >> DL> so that UFS+SU can enforce the proper dependency ordering. >> But, again, performance would be terrible :( I've checked it. On >> very sparse multi-threaded patterns (multiple torrents download on >> fast channel in my simple home case, and, I think, things could be >> worse in case of big file server in organization) and "simple" SATA >> drives it significant worse in my experience :( DL> I'm surprised that a typical drive would have enough onboard cache for DL> write caching to help signficantly in that situation. Is the torrent It is 5x64MiB in my case, oh, effectively, 4x64MiB :) Really, I could repeat experiment with some predictable and repeatable benchmark. What in out ports could be used for massively-parallel (16+ files) random (with blocks like 64KiB and file sizes like 2+GiB) but "repeatable" benchmark? DL> software doing a lot of fsync() calls? Those would essentially turn Nope. It trys to avoid fsync(), of course DL> Creating a file by writing it in random order is fairly expensive. Each DL> time a new block is written by the application, UFS+SU has to first find DL> a free block by searching the block bitmaps, mark that block as DL> allocated, wait for that write of the bitmap block to complete, write DL> the data to that block, wait for that to complete, and then write the DL> block pointer to the inode or an indirect block. Because of the random DL> write ordering, there is probably not enough locality to do coalesce DL> multiple updates to the bitmap and indirect blocks into one write before DL> the syncer interval expires. These operations all happen in the DL> background after the write() call, but once you hit the I/O per second DL> limit of the drive, eventually enough backlog builds to stall the DL> application. Also, if another update needs to be done to a block that DL> the syncer has queued for writing, that may also cause a stall until the DL> write completes. If you hack the torrent software to create and DL> pre-zero each file before it starts downloading it, then each bitmap and DL> indirect block will probably only get written once during that operation DL> and won't get written again during the actual download, and zeroing the DL> data blocks will be sequential and fast. During the download, the only DL> writes will be to the data blocks, so you might see something like a 3x DL> performance improvement. My client (transmission, from ports) is configured to do "real preallocation" (not sparse one), but it doesn't help much. It surely limited by disk I/O :( But anyway, torrent client is bad benchmark if we start to speak about some real experiments to decide what could be improved in FFS/GEOM stack, as it is not very repeatable. --=20 // Black Lion AKA Lev Serebryakov