Date: Fri, 23 Mar 2012 09:30:50 -0700 From: Taylor <j.freebsd-zfs@enone.net> To: freebsd-fs@freebsd.org Subject: ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool? Message-ID: <45654FDD-A20A-47C8-B3B5-F9B0B71CC38B@enone.net>
next in thread | raw e-mail | index | archive | help
Hello, I'm bringing up a new ZFS filesystem and have noticed something strange = with respect to the overhead from ZFS. When I create a raidz2 pool with = 512-byte sectors (ashift=3D9), I have an overhead of 2.59%, but when I = create the zpool using 4k sectors (ashift=3D12), I have an overhead of = 8.06%. This amounts to a difference of 2.79TiB in my particular = application, which I'd like to avoid. :) (Assuming I haven't done anything wrong. :) ) Is the extra overhead for = 4k sector (ashift=3D12) raidz2 pools expected? Is there any way to = reduce this? (In my very limited performance testing, 4K sectors do seem to perform = slightly better and more consistently, so I'd like to use them if I can = avoid the extra overhead.) Details below. Thanks in advance for your time, -Taylor I'm running: FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0 amd64 I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector = devices.=20 In order to "future proof" the raidz2 pool against possible variations = in replacement drive size, I've created a single partition on each = drive, starting at sector 2048 and using 100MB less than total available = space on the disk.=20 $ sudo gpart list da2 Geom name: da2 modified: false state: OK fwheads: 255 fwsectors: 63 last: 7814037134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da2p1 Mediasize: 4000682172416 (3.7T) Sectorsize: 512 Stripesize: 0 Stripeoffset: 1048576 Mode: r1w1e1 rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 4000682172416 offset: 1048576 type: freebsd-zfs index: 1 end: 7813834415 start: 2048 Consumers: 1. Name: da2 Mediasize: 4000787030016 (3.7T) Sectorsize: 512 Mode: r1w1e2 Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using 16 = drives. I create the zpool with 4K sectors as follows: $ sudo gnop create -S 4096 /dev/da2p1 $ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 da7p1 = da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 da17p1 I confirm ashift=3D12: $ sudo zdb zav | grep ashift ashift: 12 ashift: 12 "zpool list" approximately matches the expected raw capacity of = 16*4000682172416 =3D 64010914758656 bytes (58.28 TiB).=20 $ zpool list zav NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT zav 58T 1.34M 58.0T 0% 1.00x ONLINE - For raidz2, I'd expect to see 4000682172416*14 =3D 56009550413824 bytes = (50.94 TiB). However, I only get: $ zfs list zav NAME USED AVAIL REFER MOUNTPOINT zav 1.10M 46.8T 354K /zav Or using df for greater accuracy: $ df zav Filesystem 1K-blocks Used Avail Capacity Mounted on zav 50288393472 354 50288393117 0% /zav A total of 51495314915328 bytes (46.83TiB). (This is for a freshly = created zpool before any snapshots, etc. have been performed.) I measure overhead as "expected - actual / expected", which in the case = of 4k sector (ashift=3D12) raidz2 comes to 8.05%. To create a 512-byte sector (ashift=3D9) raidz2 pool, I basically just = replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm = ashift=3D9. zpool raw size is the same (as much as I can tell with such = limited precision from zpool list). However, the available size = according to zfs list/df is 54560512935936 bytes (49.62 TiB), which = amounts to an overhead of 2.58%. There are some minor differences in = ALLOC and USED size listings, so I repeat them here for the 512-byte = sector raidz2 pool: $ zpool list zav NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT zav 58T 228K 58.0T 0% 1.00x ONLINE - $ zfs list zav NAME USED AVAIL REFER MOUNTPOINT zav 198K 49.6T 73.0K /zav $ df zav Filesystem 1K-blocks Used Avail Capacity Mounted on zav 53281750914 73 53281750841 0% /zav I expect some overhead from ZFS and according to this blog post: http://www.cuddletech.com/blog/pivot/entry.php?id=3D1013 (via = http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)=20= there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly = enough, when I create a pool with no raid/mirroring, I get an overhead = of 1.93% regardless of ashift=3D9 or ashift=3D12 which is quite close to = the 1/64 number. I have also tested raidz, which has similar behavior to = raidz2, however the overhead is slightly less in each case: 1) ashift=3D9 = raidz overhead is 2.33% and 2) ashift=3D12 raidz overhead is 7.04%. In order to preserve space, I've put the zdb listings for both ashift=3D9 = and ashift=3D12 radiz2 pools here: http://pastebin.com/v2xjZkNw There are also some differences in ZDB output, for example "SPA = allocated" is higher for in the 4K sector raidz2 pool, which seems = interesting, although I don't comprehend the significance of this.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45654FDD-A20A-47C8-B3B5-F9B0B71CC38B>