From owner-freebsd-current@FreeBSD.ORG Fri Jul 17 06:14:53 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0B9E106564A for ; Fri, 17 Jul 2009 06:14:53 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: from mx1.synetsystems.com (mx1.synetsystems.com [76.10.206.14]) by mx1.freebsd.org (Postfix) with ESMTP id 7CE9D8FC21 for ; Fri, 17 Jul 2009 06:14:53 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: by mx1.synetsystems.com (Postfix, from userid 66) id 7D045D23; Fri, 17 Jul 2009 01:47:08 -0400 (EDT) Received: from rmtodd by servalan.servalan.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1MRekj-0009NV-FW; Thu, 16 Jul 2009 23:08:57 -0500 To: Louis Mamakos References: <20090716230715.GA46760@ringworld.transsys.com> From: Richard Todd Date: Thu, 16 Jul 2009 23:08:57 -0500 In-Reply-To: (Louis Mamakos's message of "Thu, 16 Jul 2009 19:07:15 -0400") Message-ID: User-Agent: Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-current@freebsd.org Subject: Re: ZFS pool corrupted on upgrade of -current (probably sata renaming) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jul 2009 06:14:54 -0000 Louis Mamakos writes: > On Wed, Jul 15, 2009 at 03:19:30PM -0700, Freddie Cash wrote: >> >> Hrm, you might need to do this from single-user mode, without the ZFS >> filesystems mounted, or the drives in use. Or from a LiveFS CD, if /usr is >> a ZFS filesystem. >> >> On our ZFS hosts, / and /usr are on UFS (gmirror). > > I don't understand why you'd expect you could take an existing > container on a disk, like a FreeBSD slice with some sort of live data > within it, and just decide you're going to take a way one or more > blocks at the end to create a new container within it? Well, technically, I don't think they were recommmending taking the slice with live data on it and labeling it, but instead detaching that slice from the mirror, labeling it, and reattaching it, causing zfs to rewrite all the data to that half of the mirror. It turns out that trying to reattach a 1-sector-shorter chunk of disk will still usually work. > If you look at page 7 of the ZFS on-disk format document that was > recently mentioned, you'll see that ZFS stores 4 copies of it's "Vdev > label"; two at the front of the physical vdev and two at the end of > the Vdev, each of them apparently 256kb in length. That's assuming > that ZFS doens't round down the size of the Vdev to some convienient > boundary. It is going to get upset that the Vdev just shrunk out from > under it? I've been investigating this a bit (testing the glabel procedure on some mdconfig'ed disks to see that it does indeed work, and reading the zfs source.) Turns out that ZFS *does* internally round down the size of each device to the next multiple of sizeof(vdef_label_t), at this line of vdev.c: osize = P2ALIGN(osize, (uint64_t)sizeof (vdev_label_t)); vdev_label_t is 256K long. So as long as your partitions are not an *exact* multiple of 256K, you should be able to freely detach, label, and reattach them. If they *are* an exact multiple of 256K, the procedure should fail on the "reattach" step, so you'll know you won't be able to proceed and would have to un-label the disk chunk and put things back as before. See below: Script started on Thu Jul 16 23:03:11 2009 You have mail. blo-rakane# diskinfo -v /dev/md2s1a /dev/md3s1a /dev/md2s1a 512 # sectorsize 517996544 # mediasize in bytes (494M) 1011712 # mediasize in sectors 1003 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. /dev/md3s1a 512 # sectorsize 517996544 # mediasize in bytes (494M) 1011712 # mediasize in sectors 1003 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. blo-rakane# zpool create test mirror md2s1a md3s1a blo-rakane# zpool status -v test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 md2s1a ONLINE 0 0 0 md3s1a ONLINE 0 0 0 errors: No known data errors blo-rakane# zpool detach test md3s1a blo-rakane# glabel label -v testd3 /dev/md3s1a Metadata value stored on /dev/md3s1a. Done. blo-rakane# zpool attach test md2s1a /dev/label/testd3 cannot attach /dev/label/testd3 to md2s1a: device is too small blo-rakane# exit exit Script done on Thu Jul 16 23:07:13 2009