From owner-freebsd-bugs@FreeBSD.ORG Wed May 8 11:20:01 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A0CE32A4 for ; Wed, 8 May 2013 11:20:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 91DF185E for ; Wed, 8 May 2013 11:20:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r48BK0eX048780 for ; Wed, 8 May 2013 11:20:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r48BK0C4048779; Wed, 8 May 2013 11:20:00 GMT (envelope-from gnats) Date: Wed, 8 May 2013 11:20:00 GMT Message-Id: <201305081120.r48BK0C4048779@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org Cc: From: Matthew Rezny Subject: Re: kern/178388: [zfs] [patch] allow up to 8MB recordsize X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Matthew Rezny List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 May 2013 11:20:01 -0000 The following reply was made to PR kern/178388; it has been noted by GNATS. From: Matthew Rezny To: Adam Nowacki Cc: bug-followup@FreeBSD.org Subject: Re: kern/178388: [zfs] [patch] allow up to 8MB recordsize Date: Wed, 8 May 2013 13:09:44 +0200 On Tue, 07 May 2013 18:46:05 +0200 Adam Nowacki wrote: > On 2013-05-07 16:11, Matthew Rezny wrote: > > The proposed patch is rather ugly. Is there some reason to not > > simply change the definition of SPA_MAXBLOCKSIZE? > > Yes. Altering the value of SPA_MAXBLOCKSIZE will change the sizes of > certain metadata objects that will break compatibility with > non-patched systems. Just importing the pool on system with modified > SPA_MAXBLOCKSIZE would result in this pool being inaccessible in > non-patched systems - forever. It will also prevent booting from zfs > pools as there is not enough memory available in the bootloader to > support large block sizes for metadata or the loader files. > That is understandable and something I had thought about but not verified if it were the case. > > The point of defining a constant is it can then be changed in the > > place it's defined rather than in every place it's used. Having to > > go change every reference to it is error prone as missing a single > > reference could wreck havoc. > > SPA_MAXBLOCKSIZE is used for far more than just a limit - in many > places it is used as a default block size. I'm introducing > SPA_BIGBLOCKSIZE because of the above compatibility problems and > using it only in places that are essential to supporting large block > sizes for file or zvol data leaving default block sizes unmodified > (especially for pool metadata). The changed block size is only in > effect when recordsize dataset property is modified by explicit > action of the administrator. Existing and new datasets created post > patch default to backwards compatible 128k block size. > > SPA_BIGBLOCKSIZE is used for asserts on the size of read/written > block, ARC cache, recordsize property bounds checks and block size > calculation logic. > > The names of the constants could probably be changed: > current SPA_MAXBLOCKSIZE to SPA_DEFAULTBLOCKSIZE > and the new SPA_BIGBLOCKSIZE to SPA_MAXBLOCKSIZE. > Changing the value of SPA_MAXBLOCKSIZE while defining SPA_DEFAULTBLOCKSIZE (or SPA_MAXCOMPATBLOCKSIZE as I almost suggested) with the prior value would be clearer in terms of both naming and patch readability. I will venture so say that the number of references to the SPA_DEFAULTBLOCKSIZE will be fewer than references to SPA_MAXBLOCKSIZE. > > Specifically, I call into question the effect this has on the > > definition of SPA_BLOCKSIZES. The reference to SPA_MAXBLOCKSIZE was > > not replaced by SPA_BIGBLOCKSIZE and thus SPA_BLOCKSIZES is > > insufficiently sized to represent all the possible block sizes that > > could be used. > > The SPA_BLOCKSIZES define is never used in the code and should > probably be removed. > In that case, please kill it now. > > That one jumped out at me when I skimmed over the patch. I have not > > reviewed all the ZFS code to look for other unchanged references > > that are not part of the patch context. > > Keep in mind that I have been using this for two months now on 3 > systems, 5 zpools and a total of over 50TB data written post-patch > with varying record sizes (128k, 1MB, 4MB, 8MB). All systems boot > directly from the big pools using unmodified (128k limited) > bootloader. Thank you for your work in this area and the extensive testing you have done. Do you have any performance data from these tests that you can share? Do you have some reason for not going beyond 8MB record size (diminishing returns, etc)? I have put some thought to additional compression algorithms in ZFS, e.g. LZMA, which would require large record size to see significant gains over existing gzip support. High strength compression on large data chunks would be slow for data frequently written, but for an archival filesystem where data is written once and then read periodically it would be quite useful.