From owner-freebsd-bugs@FreeBSD.ORG Tue Jan 4 23:50:09 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D47A21065674 for ; Tue, 4 Jan 2011 23:50:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 966758FC1B for ; Tue, 4 Jan 2011 23:50:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p04No9xF064042 for ; Tue, 4 Jan 2011 23:50:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p04No9hY064041; Tue, 4 Jan 2011 23:50:09 GMT (envelope-from gnats) Resent-Date: Tue, 4 Jan 2011 23:50:09 GMT Resent-Message-Id: <201101042350.p04No9hY064041@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Emil Smolenski Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33A1C1065674 for ; Tue, 4 Jan 2011 23:49:39 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 079458FC16 for ; Tue, 4 Jan 2011 23:49:39 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p04Nnc0I046772 for ; Tue, 4 Jan 2011 23:49:38 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id p04NncaE046771; Tue, 4 Jan 2011 23:49:38 GMT (envelope-from nobody) Message-Id: <201101042349.p04NncaE046771@red.freebsd.org> Date: Tue, 4 Jan 2011 23:49:38 GMT From: Emil Smolenski To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/153695: [PATCH] [ZFS] Booting from zpool created on 4k-sector drive doesn't work X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jan 2011 23:50:09 -0000 >Number: 153695 >Category: kern >Synopsis: [PATCH] [ZFS] Booting from zpool created on 4k-sector drive doesn't work >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jan 04 23:50:09 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Emil Smolenski >Release: FreeBSD 8.2-PRERELEASE >Organization: >Environment: FreeBSD 8.2-PRERELEASE amd64 >Description: There is a hack to force zpool creation with minimum sector size equal to 4k: # gnop create -S 4096 ${DEV0} # zpool create tank ${DEV0}.nop # zpool export tank # gnop destroy ${DEV0}.nop # zpool import tank Zpool created this way is faster on problematic 4k sector drives which lies about its sector size (like WD EARS). This hack works perfectly fine when system is running. Gnop layer is created only for "zpool create" command -- ZFS stores information about sector size in its metadata. After zpool creation one can export the pool, remove gnop layer and reimport the pool. Difference can be seen in the output from the zdb command: - on 512 sector device (2**9 = 512): % zdb tank |grep ashift ashift=9 - on 4096 sector device (2**12 = 4096): % zdb tank |grep ashift ashift=12 This change is permanent. The only possibility to change the value of ashift is: zpool destroy/create and restoring pool from backup. But there is one problem: I cannot boot from such pool. Error message: ZFS: i/o error - all block copies unavailable ZFS: can't read MOS ZFS: unexpected object set type 0 This is standard configuration with GPT scheme. # gpart show da0 => 34 2930211565 da0 GPT (1.4T) 34 30 - free - (15K) 64 128 1 freebsd-boot (64K) 192 4194304 2 freebsd-swap (2.0G) 4194496 8388608 3 freebsd-zfs (4.0G) 12583104 2917628495 - free - (1.4T) # zpool status tank pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 gpt/tank0 ONLINE 0 0 0 # zdb -uuu tank Uberblock magic = 0000000000bab10c version = 15 txg = 2838 guid_sum = 12371721502612965633 timestamp = 1292860198 UTC = Mon Dec 20 15:49:58 2010 rootbp = [L0 DMU objset] 800L/200P DVA[0]=<0:2041000:1000> DVA[1]=<0:30062000:1000> DVA[2]=<0:ee0bd000:1000> fletcher4 lzjb LE contiguous birth=2838 fill=374 cksum=c9605617d:4e2cf0a8c94:f6decb77086a:210752c3aee4a8 This PR is similar to message which I send to freebsd-fs@ mailing list: http://lists.freebsd.org/pipermail/freebsd-fs/2010-December/010350.html >How-To-Repeat: Install FreeBSD 8.2-PRERELEASE like described here: http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot with following exceptions: 1. Add "-b 32K" to "gpart add -s 64K -t freebsd-boot ad0" command for proper 4k alignment. 2. Issue "gnop create -S 4096 ad0" command before creating zpool. 3. Use .nop device to create zpool. 4. Export the zpool, remove the gnop layer and reimport the zpool. >Fix: Attached patch is far from being perfect. I posted it here because maybe it could help someone skilled enough in tracking down this issue and provide proper solution. With this patch applied: - I can boot from single disk zpool, - I can boot from mirrored zpool, - I can't boot from raidz, - I can't boot from mirrored zpool created on HP SmartArray P400 (it may be related to bug described in this PR: kern/151910 Patch attached with submission follows: diff -ruN sys.orig/boot/zfs/zfsimpl.c sys/boot/zfs/zfsimpl.c --- sys.orig/boot/zfs/zfsimpl.c 2010-12-29 13:55:38.195215000 +0100 +++ sys/boot/zfs/zfsimpl.c 2010-12-29 13:24:39.825206000 +0100 @@ -770,7 +770,7 @@ const char *pool_name; const unsigned char *vdevs; int i, rc, is_newer; - char upbuf[1024]; + vdev_phys_t upbuf; const struct uberblock *up; /* @@ -921,21 +921,21 @@ * the contents of the pool. */ for (i = 0; - i < VDEV_UBERBLOCK_RING >> UBERBLOCK_SHIFT; + i < VDEV_UBERBLOCK_COUNT(vdev); i++) { - off = offsetof(vdev_label_t, vl_uberblock); - off += i << UBERBLOCK_SHIFT; + off = VDEV_UBERBLOCK_OFFSET(vdev, i); BP_ZERO(&bp); DVA_SET_OFFSET(&bp.blk_dva[0], off); - BP_SET_LSIZE(&bp, 1 << UBERBLOCK_SHIFT); - BP_SET_PSIZE(&bp, 1 << UBERBLOCK_SHIFT); + DVA_SET_ASIZE(&bp.blk_dva[0], VDEV_UBERBLOCK_SIZE(vdev)); + BP_SET_LSIZE(&bp, VDEV_UBERBLOCK_SIZE(vdev)); + BP_SET_PSIZE(&bp, VDEV_UBERBLOCK_SIZE(vdev)); BP_SET_CHECKSUM(&bp, ZIO_CHECKSUM_LABEL); BP_SET_COMPRESS(&bp, ZIO_COMPRESS_OFF); ZIO_SET_CHECKSUM(&bp.blk_cksum, off, 0, 0, 0); - if (vdev_read_phys(vdev, &bp, upbuf, off, 0)) + if (vdev_read_phys(vdev, &bp, &upbuf, off, 0)) continue; - up = (const struct uberblock *) upbuf; + up = (const struct uberblock *) &upbuf; if (up->ub_magic != UBERBLOCK_MAGIC) continue; if (up->ub_txg < spa->spa_txg) diff -ruN sys.orig/cddl/boot/zfs/zfsimpl.h sys/cddl/boot/zfs/zfsimpl.h --- sys.orig/cddl/boot/zfs/zfsimpl.h 2010-12-29 13:55:58.870864000 +0100 +++ sys/cddl/boot/zfs/zfsimpl.h 2010-12-29 13:24:29.014796000 +0100 @@ -324,7 +324,7 @@ #define VDEV_UBERBLOCK_RING (128 << 10) #define VDEV_UBERBLOCK_SHIFT(vd) \ - MAX((vd)->vdev_top->vdev_ashift, UBERBLOCK_SHIFT) + MAX((vd)->v_ashift, UBERBLOCK_SHIFT) #define VDEV_UBERBLOCK_COUNT(vd) \ (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT(vd)) #define VDEV_UBERBLOCK_OFFSET(vd, n) \ >Release-Note: >Audit-Trail: >Unformatted: