From owner-freebsd-stable@FreeBSD.ORG Sun May 31 02:59:45 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 825E2106564A for ; Sun, 31 May 2009 02:59:45 +0000 (UTC) (envelope-from louie@transsys.com) Received: from ringworld.transsys.com (ringworld.transsys.com [144.202.0.15]) by mx1.freebsd.org (Postfix) with ESMTP id E2CCC8FC15 for ; Sun, 31 May 2009 02:59:44 +0000 (UTC) (envelope-from louie@transsys.com) Received: from PM-G5.transsys.com (c-69-141-150-106.hsd1.nj.comcast.net [69.141.150.106]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: louie) by ringworld.transsys.com (Postfix) with ESMTP id 10B5D5C4B; Sat, 30 May 2009 22:59:44 -0400 (EDT) Message-Id: <17FA5EE8-40DB-46F8-AC4F-1F26E7732974@transsys.com> From: Louis Mamakos To: Dan Naumov In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sat, 30 May 2009 22:59:42 -0400 References: <8C45E753-16F0-43AF-BD3F-D3F6A8F73B60@transsys.com> X-Mailer: Apple Mail (2.935.3) Cc: freebsd-stable@freebsd.org Subject: Re: ZFS NAS configuration question X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 May 2009 02:59:45 -0000 The system that I built had 5 x 72GB SCA SCSI drives. Just to keep my own sanity, I decided that I'd configure the fdisk partitioning identically across all of the drives. So that they all have a 1GB slice and and a 71GB slice. The drives all have identical capacity, so the second 71GB slice ends up the same on all of the drives. I actually end up using glabel to create a named unit of storage, so that I don't have to worry about getting the drives inserted into the right holes.. I figured that 1GB wasn't too far off for both swap partitions (3 of 'em) plus a pair mirrored to boot from. I haven't really addressed directly swapping another drive of a slightly different size, though I've spares and I could always put a larger drive in and create a slice at the right size. It looks like this, with all of the slices explicitly named with glabel: root@droid[41] # glabel status Name Status Components label/boot0 N/A da0s1 label/zpool0 N/A da0s2 label/boot1 N/A da1s1 label/zpool1 N/A da1s2 label/swap2 N/A da2s1 label/zpool2 N/A da2s2 label/swap3 N/A da3s1 label/zpool3 N/A da3s2 label/swap4 N/A da4s1 label/zpool4 N/A da4s2 And the ZFS pool references the labeled slices: root@droid[42] # zpool status pool: z state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM z ONLINE 0 0 0 raidz2 ONLINE 0 0 0 label/zpool0 ONLINE 0 0 0 label/zpool1 ONLINE 0 0 0 label/zpool2 ONLINE 0 0 0 label/zpool3 ONLINE 0 0 0 label/zpool4 ONLINE 0 0 0 errors: No known data errors And swap on the other ones: root@droid[43] # swapinfo Device 1024-blocks Used Avail Capacity /dev/label/swap4 1044192 0 1044192 0% /dev/label/swap3 1044192 0 1044192 0% /dev/label/swap2 1044192 0 1044192 0% Total 3132576 0 3132576 0% This is the mirrored partition that the system actually boots from. This maps physically to da0s1 and da1s1. The normal boot0 and boot1/boot2 and loader operate typically on da0s1a which is really /dev/mirror/boota: root@droid[45] # gmirror status Name Status Components mirror/boot COMPLETE label/boot0 label/boot1 root@droid[47] # df -t ufs Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/mirror/boota 1008582 680708 247188 73% /bootdir The UFS partition eventually ends up getting mounted on /bootdir: root@droid[51] # cat /etc/fstab # Device Mountpoint FStype Options Dump Pass# zfs:z/root / zfs rw 0 0 /dev/mirror/boota /bootdir ufs rw,noatime 1 1 /dev/label/swap2 none swap sw 0 0 /dev/label/swap3 none swap sw 0 0 /dev/label/swap4 none swap sw 0 0 /dev/acd0 /cdrom cd9660 ro,noauto 0 0 But when /boot/loader on the UFS partition reads what it thinks is / etc/fstab, which eventually ends up in /bootdir/etc/fstab, the root file system that's mounted is the ZFS filesystem at z/root: root@droid[52] # head /bootdir/etc/fstab # Device Mountpoint FStype Options Dump Pass# z/root / zfs rw 0 0 And /boot on the ZFS root is symlinked into the UFS filesystem, so it gets updated when a make installworld happens: root@droid[53] # ls -l /boot lrwxr-xr-x 1 root wheel 12 May 3 23:00 /boot@ -> bootdir/boot louie On May 30, 2009, at 3:15 PM, Dan Naumov wrote: > Is the idea behind leaving 1GB unused on each disk to work around the > problem of potentially being unable to replace a failed device in a > ZFS pool because a 1TB replacement you bought actually has a lower > sector count than your previous 1TB drive (since the replacement > device has to be either of exact same size or bigger than the old > device)? > > - Dan Naumov > > > On Sat, May 30, 2009 at 10:06 PM, Louis Mamakos > wrote: >> I built a system recently with 5 drives and ZFS. I'm not booting >> off a ZFS >> root, though it does mount a ZFS file system once the system has >> booted from >> a UFS file system. Rather than dedicate drives, I simply >> partitioned each >> of the drives into a 1G partition >