From owner-freebsd-stable@FreeBSD.ORG Tue Jul 19 03:59:45 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E68F106566C for ; Tue, 19 Jul 2011 03:59:45 +0000 (UTC) (envelope-from glen.j.barber@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 596E48FC08 for ; Tue, 19 Jul 2011 03:59:45 +0000 (UTC) Received: by gwb15 with SMTP id 15so1965158gwb.13 for ; Mon, 18 Jul 2011 20:59:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=3pONQ5NqVMhzlGOq/HwPxOMGX8f4tXjNt+Lb3D9ngu8=; b=CQQrCHSzeukowSTWYh53kmFLg1Fu/oLGYkgaK5bfayGBSn5jRdCLADjyaGgARlVxKa Q5XkYWv/THhkOPpTEYZcKkofG9YROj0+KOIvn8TVZ9MTsnDv6wHbuHU+71H1W9lPLxS1 8RzFCf46X0Nw6WkIIrW8ygpn/y5sUD5t1VhsU= Received: by 10.236.185.229 with SMTP id u65mr4592298yhm.511.1311046683175; Mon, 18 Jul 2011 20:38:03 -0700 (PDT) Received: from schism.local (c-76-124-49-145.hsd1.pa.comcast.net [76.124.49.145]) by mx.google.com with ESMTPS id f4sm3800597yhn.41.2011.07.18.20.38.01 (version=SSLv3 cipher=OTHER); Mon, 18 Jul 2011 20:38:01 -0700 (PDT) Message-ID: <4E24FC18.3010605@gmail.com> Date: Mon, 18 Jul 2011 23:38:00 -0400 From: Glen Barber User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Jeremy Chadwick References: <20110718234124.GA5626@icarus.home.lan> In-Reply-To: <20110718234124.GA5626@icarus.home.lan> X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: Status of support for 4KB disk sectors X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2011 03:59:45 -0000 On 7/18/11 7:41 PM, Jeremy Chadwick wrote: > On Mon, Jul 18, 2011 at 03:50:15PM -0700, Kevin Oberman wrote: >> I just want to check on the status of 4K sector support in FreeBSD. I read >> a long thread on the topic from a while back and it looks like I might hit some >> issues if I'm not REALLY careful. Since I will be keeping the existing Windows >> installation, I need to be sure that I can set up the disk correctly without >> screwing up Windows 7. >> >> I was planning on just DDing the W7 slice over, but I am not sure how well this >> would play with GPT. Or should I not try to use GPT at all? I'd like >> to as this laptop >> spreads Windows 7 over two slices and adds a third for the recovery >> system, leaving >> only one for FreeBSD and I'd like to put my files in a separate slice. >> GPT would offer >> that fifth slice. >> >> I have read the handbook and don't see any reference to 4K sectors and only a >> one-liner about gpart(8) and GPT. Oncew I get this all figured out, >> I'll see about writing >> an update about this as GPT looks like the way to go in e future. > > When you say "4KB sector support", what do you mean by this? All > drives on the market as of this writing, that I've seen, all claim a > physical/logical sector size of 512 bytes -- yes, even SSDs, and EARS > drives which we know use 4KB sectors. They do this to guarantee full > compatibility with existing software. > > Since you're talking about gpart and "4KB sector support", did you mean > to ask "what's the state of FreeBSD and aligned partition support to > ensure decent performance with 4KB-sector drives?" > > If so: there have been some commits in recent days to RELENG_8 to help > try to address the shortcomings of the existing utilities and GEOM > infrastructure. Read the most recent commit text carefully: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/geom/class/part/geom_part.c > > But the currently "known method" is to use gnop(8). Here's an example: > > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ > Notice: I'm reading this as "how badly do 'green drives' suck?" FWIW, I've recently done the gnop(8) trick to two "green" drives in one of my machines because I was seeing horrifying performance problems with what I consider to be basic stuff, like 'portsnap extract', but more severely with copying large data (file-backed bacula files to be exact) into said datasets. I have yet to retry my read/write tests with drives I have not converted with gnop(8). I have not conclusively tested all possible combinations of configurations, nor reverted the changes to the drives to retest, but if it is of any interest, here's what I'm seeing. I have comparisons between WD "green" and "black" drives. Unfortunately, the machines are not completely similar - one is a Core2Quad, the other Core2Duo; one has 6GB RAM, the other 8GB RAM; also, 'orion' is running a month-old 8-STABLE; 'kaos' is running a 2-week-old -CURRENT. Both machines are using ZFSv28: orion % sysctl -n hw.ncpu; sysctl -n hw.physmem 4 6353416192 kaos % sysctl -n hw.ncpu; sysctl -n hw.physmem 2 8534401024 The drives in 'orion' are 1TB WD green drives in a ZFS mirror; the drives in 'kaos' are 1TB WD black drives in a raidz1 (3 drives). First the read test: kaos % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null' 12.94 real 0.60 user 11.95 sys orion % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null' 118.02 real 0.46 user 8.74 sys I guess no real surprise here. 'kaos' has more spindles to read from, on top of faster seek times. Next the write test: The 'compressed' and 'dedup' datasets referenced below are 'lzjb' and 'sha256,verify', respectively. I'd wait for the 'compressed+dedup' tests to finish, but I have to wake up tomorrow morning. orion# sh -c 'time portsnap extract -p /zstore/perftest >/dev/null' 306.71 real 44.37 user 110.28 sys orion# sh -c 'time portsnap extract -p /zstore/perftest_compress >/dev/null' 166.62 real 43.87 user 109.49 sys orion# sh -c 'time portsnap extract -p /zstore/perftest_dedup >/dev/null' 3576.43 real 44.98 user 109.12 sys kaos# sh -c 'time portsnap extract -p /perftest >/dev/null' 311.31 real 51.23 user 193.37 sys kaos# sh -c 'time portsnap extract -p /perftest_compress >/dev/null' 269.85 real 49.55 user 191.56 sys kaos# sh -c 'time portsnap extract -p /perftest_dedup >/dev/null' 4655.73 real 51.86 user 196.22 sys Like I said, I have not yet had the time to retest this on drives without the gnop(8) fix (another similar zpool with 2 drives), so maybe the data I'm providing isn't relevant, but since the gnop(8) fix for 4K sector drives was mentioned, I thought it might be relevant to a point. > Now, that's for ZFS, but I'm under the impression the exact same is > needed for FFS/UFS. > > Do I bother doing this with my SSDs? No. Am I suffering in > performance? Probably. Why do I not care? Because the level of > annoyance is extremely high -- remember, all of this has to be done from > within the installer environment (referring to "Emergency Shell"), which > on FreeBSD lacks an incredible amount of usability, and is even worse to > deal with when doing a remote install via PXE/serial. Fixit is the only > decent environment. Given that floppies are more or less gone, I don't > understand why the Fixit environment doesn't replace the "Emergency > Shell". > Not that it necessarily helps in a PXE environment, but a memstick of 9-CURRENT has helped me recover minor "oops" situations a few times over the past few months. What are these "floppies" you speak of, again? :) Regards, -- Glen Barber