Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Mar 2012 09:30:50 -0700
From:      Taylor <j.freebsd-zfs@enone.net>
To:        freebsd-fs@freebsd.org
Subject:   ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool?
Message-ID:  <45654FDD-A20A-47C8-B3B5-F9B0B71CC38B@enone.net>

next in thread | raw e-mail | index | archive | help
Hello,

I'm bringing up a new ZFS filesystem and have noticed something strange =
with respect to the overhead from ZFS. When I create a raidz2 pool with =
512-byte sectors (ashift=3D9), I have an overhead of 2.59%, but when I =
create the zpool using 4k sectors (ashift=3D12), I have an overhead of =
8.06%. This amounts to a difference of 2.79TiB in my particular =
application, which I'd like to avoid. :)

(Assuming I haven't done anything wrong. :) ) Is the extra overhead for =
4k sector (ashift=3D12) raidz2 pools expected? Is there any way to =
reduce this?

(In my very limited performance testing, 4K sectors do seem to perform =
slightly better and more consistently, so I'd like to use them if I can =
avoid the extra overhead.)

Details below.

Thanks in advance for your time,

-Taylor



I'm running:
FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0  amd64

I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector =
devices.=20

In order to "future proof" the raidz2 pool against possible variations =
in replacement drive size, I've created a single partition on each =
drive, starting at sector 2048 and using 100MB less than total available =
space on the disk.=20
$ sudo gpart list da2
Geom name: da2
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 7814037134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da2p1
  Mediasize: 4000682172416 (3.7T)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 1048576
  Mode: r1w1e1
  rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: (null)
  length: 4000682172416
  offset: 1048576
  type: freebsd-zfs
  index: 1
  end: 7813834415
  start: 2048
Consumers:
1. Name: da2
  Mediasize: 4000787030016 (3.7T)
  Sectorsize: 512
  Mode: r1w1e2

Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using 16 =
drives.  I create the zpool with 4K sectors as follows:
$ sudo gnop create -S 4096 /dev/da2p1
$ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 da7p1 =
da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 da17p1

I confirm ashift=3D12:
$ sudo zdb zav | grep ashift
               ashift: 12
               ashift: 12

"zpool list" approximately matches the expected raw capacity of =
16*4000682172416 =3D 64010914758656 bytes (58.28 TiB).=20
$ zpool list zav
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zav     58T  1.34M  58.0T     0%  1.00x  ONLINE  -

For raidz2, I'd expect to see 4000682172416*14 =3D 56009550413824 bytes =
(50.94 TiB). However, I only get:
$ zfs list zav
NAME   USED  AVAIL  REFER  MOUNTPOINT
zav   1.10M  46.8T  354K  /zav

Or using df for greater accuracy:
$ df zav
Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
zav        50288393472  354 50288393117     0%    /zav

A total of 51495314915328 bytes (46.83TiB). (This is for a freshly =
created zpool before any snapshots, etc. have been performed.)

I measure overhead as "expected - actual / expected", which in the case =
of 4k sector (ashift=3D12) raidz2 comes to 8.05%.

To create a 512-byte sector (ashift=3D9) raidz2 pool, I basically just =
replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm =
ashift=3D9. zpool raw size is the same (as much as I can tell with such =
limited precision from zpool list). However, the available size =
according to zfs list/df is 54560512935936 bytes (49.62 TiB), which =
amounts to an overhead of 2.58%. There are some minor differences in =
ALLOC and USED size listings, so I repeat them here for the 512-byte =
sector raidz2 pool:
$ zpool list zav
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zav     58T   228K  58.0T     0%  1.00x  ONLINE  -
$ zfs list zav
NAME   USED  AVAIL  REFER  MOUNTPOINT
zav    198K  49.6T  73.0K  /zav
$ df zav
Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
zav        53281750914   73 53281750841     0%    /zav

I expect some overhead from ZFS and according to this blog post:
http://www.cuddletech.com/blog/pivot/entry.php?id=3D1013
(via =
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)=20=

there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly =
enough, when I create a pool with no raid/mirroring, I get an overhead =
of 1.93% regardless of ashift=3D9 or ashift=3D12 which is quite close to =
the 1/64 number. I have also tested raidz, which has similar behavior to =
raidz2, however the overhead is slightly less in each case: 1) ashift=3D9 =
raidz overhead is 2.33% and 2) ashift=3D12 raidz overhead is 7.04%.

In order to preserve space, I've put the zdb listings for both ashift=3D9 =
and ashift=3D12 radiz2 pools here:
http://pastebin.com/v2xjZkNw

There are also some differences in ZDB output, for example "SPA =
allocated" is higher for in the 4K sector raidz2 pool, which seems =
interesting, although I don't comprehend the significance of this.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45654FDD-A20A-47C8-B3B5-F9B0B71CC38B>