Date: Fri, 23 Mar 2012 09:30:50 -0700 From: Taylor <j.freebsd-zfs@enone.net> To: freebsd-fs@freebsd.org Subject: ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool? Message-ID: <45654FDD-A20A-47C8-B3B5-F9B0B71CC38B@enone.net>
next in thread | raw e-mail | index | archive | help
Hello,
I'm bringing up a new ZFS filesystem and have noticed something strange =
with respect to the overhead from ZFS. When I create a raidz2 pool with =
512-byte sectors (ashift=3D9), I have an overhead of 2.59%, but when I =
create the zpool using 4k sectors (ashift=3D12), I have an overhead of =
8.06%. This amounts to a difference of 2.79TiB in my particular =
application, which I'd like to avoid. :)
(Assuming I haven't done anything wrong. :) ) Is the extra overhead for =
4k sector (ashift=3D12) raidz2 pools expected? Is there any way to =
reduce this?
(In my very limited performance testing, 4K sectors do seem to perform =
slightly better and more consistently, so I'd like to use them if I can =
avoid the extra overhead.)
Details below.
Thanks in advance for your time,
-Taylor
I'm running:
FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0 amd64
I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector =
devices.=20
In order to "future proof" the raidz2 pool against possible variations =
in replacement drive size, I've created a single partition on each =
drive, starting at sector 2048 and using 100MB less than total available =
space on the disk.=20
$ sudo gpart list da2
Geom name: da2
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da2p1
Mediasize: 4000682172416 (3.7T)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 1048576
Mode: r1w1e1
rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634
rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
label: (null)
length: 4000682172416
offset: 1048576
type: freebsd-zfs
index: 1
end: 7813834415
start: 2048
Consumers:
1. Name: da2
Mediasize: 4000787030016 (3.7T)
Sectorsize: 512
Mode: r1w1e2
Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using 16 =
drives. I create the zpool with 4K sectors as follows:
$ sudo gnop create -S 4096 /dev/da2p1
$ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 da7p1 =
da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 da17p1
I confirm ashift=3D12:
$ sudo zdb zav | grep ashift
ashift: 12
ashift: 12
"zpool list" approximately matches the expected raw capacity of =
16*4000682172416 =3D 64010914758656 bytes (58.28 TiB).=20
$ zpool list zav
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zav 58T 1.34M 58.0T 0% 1.00x ONLINE -
For raidz2, I'd expect to see 4000682172416*14 =3D 56009550413824 bytes =
(50.94 TiB). However, I only get:
$ zfs list zav
NAME USED AVAIL REFER MOUNTPOINT
zav 1.10M 46.8T 354K /zav
Or using df for greater accuracy:
$ df zav
Filesystem 1K-blocks Used Avail Capacity Mounted on
zav 50288393472 354 50288393117 0% /zav
A total of 51495314915328 bytes (46.83TiB). (This is for a freshly =
created zpool before any snapshots, etc. have been performed.)
I measure overhead as "expected - actual / expected", which in the case =
of 4k sector (ashift=3D12) raidz2 comes to 8.05%.
To create a 512-byte sector (ashift=3D9) raidz2 pool, I basically just =
replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm =
ashift=3D9. zpool raw size is the same (as much as I can tell with such =
limited precision from zpool list). However, the available size =
according to zfs list/df is 54560512935936 bytes (49.62 TiB), which =
amounts to an overhead of 2.58%. There are some minor differences in =
ALLOC and USED size listings, so I repeat them here for the 512-byte =
sector raidz2 pool:
$ zpool list zav
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zav 58T 228K 58.0T 0% 1.00x ONLINE -
$ zfs list zav
NAME USED AVAIL REFER MOUNTPOINT
zav 198K 49.6T 73.0K /zav
$ df zav
Filesystem 1K-blocks Used Avail Capacity Mounted on
zav 53281750914 73 53281750841 0% /zav
I expect some overhead from ZFS and according to this blog post:
http://www.cuddletech.com/blog/pivot/entry.php?id=3D1013
(via =
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)=20=
there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly =
enough, when I create a pool with no raid/mirroring, I get an overhead =
of 1.93% regardless of ashift=3D9 or ashift=3D12 which is quite close to =
the 1/64 number. I have also tested raidz, which has similar behavior to =
raidz2, however the overhead is slightly less in each case: 1) ashift=3D9 =
raidz overhead is 2.33% and 2) ashift=3D12 raidz overhead is 7.04%.
In order to preserve space, I've put the zdb listings for both ashift=3D9 =
and ashift=3D12 radiz2 pools here:
http://pastebin.com/v2xjZkNw
There are also some differences in ZDB output, for example "SPA =
allocated" is higher for in the 4K sector raidz2 pool, which seems =
interesting, although I don't comprehend the significance of this.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45654FDD-A20A-47C8-B3B5-F9B0B71CC38B>
