Date: Mon, 18 Jul 2011 23:38:00 -0400 From: Glen Barber <glen.j.barber@gmail.com> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org Subject: Re: Status of support for 4KB disk sectors Message-ID: <4E24FC18.3010605@gmail.com> In-Reply-To: <20110718234124.GA5626@icarus.home.lan> References: <CAN6yY1uaUqk2ifiNViJyMFJWf60a4DmCiVs3Z=--_TjtzseABQ@mail.gmail.com> <20110718234124.GA5626@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/18/11 7:41 PM, Jeremy Chadwick wrote: > On Mon, Jul 18, 2011 at 03:50:15PM -0700, Kevin Oberman wrote: >> I just want to check on the status of 4K sector support in FreeBSD. I read >> a long thread on the topic from a while back and it looks like I might hit some >> issues if I'm not REALLY careful. Since I will be keeping the existing Windows >> installation, I need to be sure that I can set up the disk correctly without >> screwing up Windows 7. >> >> I was planning on just DDing the W7 slice over, but I am not sure how well this >> would play with GPT. Or should I not try to use GPT at all? I'd like >> to as this laptop >> spreads Windows 7 over two slices and adds a third for the recovery >> system, leaving >> only one for FreeBSD and I'd like to put my files in a separate slice. >> GPT would offer >> that fifth slice. >> >> I have read the handbook and don't see any reference to 4K sectors and only a >> one-liner about gpart(8) and GPT. Oncew I get this all figured out, >> I'll see about writing >> an update about this as GPT looks like the way to go in e future. > > When you say "4KB sector support", what do you mean by this? All > drives on the market as of this writing, that I've seen, all claim a > physical/logical sector size of 512 bytes -- yes, even SSDs, and EARS > drives which we know use 4KB sectors. They do this to guarantee full > compatibility with existing software. > > Since you're talking about gpart and "4KB sector support", did you mean > to ask "what's the state of FreeBSD and aligned partition support to > ensure decent performance with 4KB-sector drives?" > > If so: there have been some commits in recent days to RELENG_8 to help > try to address the shortcomings of the existing utilities and GEOM > infrastructure. Read the most recent commit text carefully: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sbin/geom/class/part/geom_part.c > > But the currently "known method" is to use gnop(8). Here's an example: > > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/ > Notice: I'm reading this as "how badly do 'green drives' suck?" FWIW, I've recently done the gnop(8) trick to two "green" drives in one of my machines because I was seeing horrifying performance problems with what I consider to be basic stuff, like 'portsnap extract', but more severely with copying large data (file-backed bacula files to be exact) into said datasets. I have yet to retry my read/write tests with drives I have not converted with gnop(8). I have not conclusively tested all possible combinations of configurations, nor reverted the changes to the drives to retest, but if it is of any interest, here's what I'm seeing. I have comparisons between WD "green" and "black" drives. Unfortunately, the machines are not completely similar - one is a Core2Quad, the other Core2Duo; one has 6GB RAM, the other 8GB RAM; also, 'orion' is running a month-old 8-STABLE; 'kaos' is running a 2-week-old -CURRENT. Both machines are using ZFSv28: orion % sysctl -n hw.ncpu; sysctl -n hw.physmem 4 6353416192 kaos % sysctl -n hw.ncpu; sysctl -n hw.physmem 2 8534401024 The drives in 'orion' are 1TB WD green drives in a ZFS mirror; the drives in 'kaos' are 1TB WD black drives in a raidz1 (3 drives). First the read test: kaos % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null' 12.94 real 0.60 user 11.95 sys orion % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null' 118.02 real 0.46 user 8.74 sys I guess no real surprise here. 'kaos' has more spindles to read from, on top of faster seek times. Next the write test: The 'compressed' and 'dedup' datasets referenced below are 'lzjb' and 'sha256,verify', respectively. I'd wait for the 'compressed+dedup' tests to finish, but I have to wake up tomorrow morning. orion# sh -c 'time portsnap extract -p /zstore/perftest >/dev/null' 306.71 real 44.37 user 110.28 sys orion# sh -c 'time portsnap extract -p /zstore/perftest_compress >/dev/null' 166.62 real 43.87 user 109.49 sys orion# sh -c 'time portsnap extract -p /zstore/perftest_dedup >/dev/null' 3576.43 real 44.98 user 109.12 sys kaos# sh -c 'time portsnap extract -p /perftest >/dev/null' 311.31 real 51.23 user 193.37 sys kaos# sh -c 'time portsnap extract -p /perftest_compress >/dev/null' 269.85 real 49.55 user 191.56 sys kaos# sh -c 'time portsnap extract -p /perftest_dedup >/dev/null' 4655.73 real 51.86 user 196.22 sys Like I said, I have not yet had the time to retest this on drives without the gnop(8) fix (another similar zpool with 2 drives), so maybe the data I'm providing isn't relevant, but since the gnop(8) fix for 4K sector drives was mentioned, I thought it might be relevant to a point. > Now, that's for ZFS, but I'm under the impression the exact same is > needed for FFS/UFS. > > <rant> Do I bother doing this with my SSDs? No. Am I suffering in > performance? Probably. Why do I not care? Because the level of > annoyance is extremely high -- remember, all of this has to be done from > within the installer environment (referring to "Emergency Shell"), which > on FreeBSD lacks an incredible amount of usability, and is even worse to > deal with when doing a remote install via PXE/serial. Fixit is the only > decent environment. Given that floppies are more or less gone, I don't > understand why the Fixit environment doesn't replace the "Emergency > Shell". </rant> > Not that it necessarily helps in a PXE environment, but a memstick of 9-CURRENT has helped me recover minor "oops" situations a few times over the past few months. What are these "floppies" you speak of, again? :) Regards, -- Glen Barber
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E24FC18.3010605>