From owner-freebsd-geom@FreeBSD.ORG Sun Dec 5 14:29:14 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C1EF106566C for ; Sun, 5 Dec 2010 14:29:14 +0000 (UTC) (envelope-from gcubfg-freebsd-geom@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 996C88FC1D for ; Sun, 5 Dec 2010 14:29:13 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PPFaR-0005wb-Nd for freebsd-geom@freebsd.org; Sun, 05 Dec 2010 15:29:11 +0100 Received: from cpe-188-129-83-203.dynamic.amis.hr ([188.129.83.203]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Dec 2010 15:29:11 +0100 Received: from ivoras by cpe-188-129-83-203.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Dec 2010 15:29:11 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-geom@freebsd.org From: Ivan Voras Date: Sun, 05 Dec 2010 15:28:56 +0100 Lines: 63 Message-ID: References: <4CE4E2B2.7070702@delphij.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-83-203.dynamic.amis.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <4CE4E2B2.7070702@delphij.net> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS stripesize patch (in the context of 4k sector drives) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Dec 2010 14:29:14 -0000 On 11/18/10 09:24, Xin LI wrote: > On 11/12/10 10:09, Ivan Voras wrote: >> On 11/12/10 16:00, Ivan Voras wrote: >>> Hello, >>> >>> Any objections to me committing the following patch? >>> >>> The intention is to use stripesize info from GEOM in creating vdevs, in >>> the hope that the 4 KiB sector magic will work. > >> Or maybe not. I've grepped and other tools use stripesize in the way its >> name suggests - as RAID stripe size, not as logical sector size. > >> New idea on the menu: make the logical sector size a separate concept >> and a separate variable from stripe size. Would that be a better approach? > > Have you tested this booting from existing ZFS file system? No, but it will probably work because ashift is stored in ZFS metadata and compliant implementations should read it. I did tests with ZFS and combined ashift and sector sizes with gnop and here is what is possible and what isn't: * Pools created with ashift of 512 and imported while sectorsize in GEOM is 512 byte will work. * Pools created with ashift of 512 and imported while sectorsize in GEOM is 4096 will NOT work * Pools created with ashift of 4096 and imported while sectorsize in GEOM is 512 byte will work * Pools created with ashift of 4096 and imported while sectorsize in GEOM is 4096 byte will work. Basically, only increasing sectorsize (i.e. minimum IO alignment) will cause drives which had formerly been formatted with old (512 byte) sector size will not work. Personally, I'd still do it sooner rather than later to reduce the number of users which have problems with it, but after discussing it with mav I also understand the conservative side. Also from this discussion came the idea of capping ashift to some upper value. SPA_MAXBLOCKSIZE (128 KiB) looks reasonable for this so here's an updated patch. As the goal is to deal with current 4 KiB sector drives, the whole thing may need to be revisited in the future if there are other devices which fill in stripesize (probably by introducing a "physsectorsize" field). Comments? Ideas? --- vdev_geom.c.ori 2010-12-05 15:08:09.000000000 +0100 +++ vdev_geom.c 2010-12-05 15:10:50.000000000 +0100 @@ -496,7 +496,10 @@ /* * Determine the device's minimum transfer size. */ - *ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1; + if (pp->stripesize != 0 && pp->stripesize > pp->sectorsize) + *ashift = highbit(MIN(pp->stripesize, SPA_MAXBLOCKSIZE)) - 1; + else + *ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1; /* * Clear the nowritecache bit, so that on a vdev_reopen() we will