Date: Sat, 24 Mar 2012 11:38:50 -0700 From: Taylor <j.freebsd-zfs@enone.net> To: Dennis Glatting <dg@pki2.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool? Message-ID: <BA3EA647-9585-462E-AE40-DBDCFEF69743@enone.net> In-Reply-To: <alpine.BSF.2.00.1203230937580.89054@btw.pki2.com> References: <45654FDD-A20A-47C8-B3B5-F9B0B71CC38B@enone.net> <alpine.BSF.2.00.1203230937580.89054@btw.pki2.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Dennis, This is a bit off topic from my original question and I'm hoping not to = distract from it too much, but to briefly answer your question: My experience with 4TB Hitachi drives is limited; I've only had these = drives for about a week. One of the drives exhibited ICRC errors, which in theory could be just a cabling = issue, but I couldn't reproduce the problem with the same cable/slot and different drive, so I ended up = RMAing the ICRC drive just in case. However, I have had good luck with Hitachi 3TB drives over the = past year, one Hitachi 4TB drive over the last month and have not encountered any other = problems with this batch of 4TB drives so far. Cheers, -Taylor On Mar 23, 2012, at 9:40 AM, Dennis Glatting wrote: >=20 > Somewhat related: >=20 > I am also using 4TB Hitachi drives but only four. Although fairly = happy with these drives I have had one disk fail in the two months I = have been using them. This may have been an infant failure but I am = wondering if you have had any similar experiances with the drives. >=20 >=20 >=20 > On Fri, 23 Mar 2012, Taylor wrote: >=20 >> Hello, >>=20 >> I'm bringing up a new ZFS filesystem and have noticed something = strange with respect to the overhead from ZFS. When I create a raidz2 = pool with 512-byte sectors (ashift=3D9), I have an overhead of 2.59%, = but when I create the zpool using 4k sectors (ashift=3D12), I have an = overhead of 8.06%. This amounts to a difference of 2.79TiB in my = particular application, which I'd like to avoid. :) >>=20 >> (Assuming I haven't done anything wrong. :) ) Is the extra overhead = for 4k sector (ashift=3D12) raidz2 pools expected? Is there any way to = reduce this? >>=20 >> (In my very limited performance testing, 4K sectors do seem to = perform slightly better and more consistently, so I'd like to use them = if I can avoid the extra overhead.) >>=20 >> Details below. >>=20 >> Thanks in advance for your time, >>=20 >> -Taylor >>=20 >>=20 >>=20 >> I'm running: >> FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0 amd64 >>=20 >> I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector = devices. >>=20 >> In order to "future proof" the raidz2 pool against possible = variations in replacement drive size, I've created a single partition on = each drive, starting at sector 2048 and using 100MB less than total = available space on the disk. >> $ sudo gpart list da2 >> Geom name: da2 >> modified: false >> state: OK >> fwheads: 255 >> fwsectors: 63 >> last: 7814037134 >> first: 34 >> entries: 128 >> scheme: GPT >> Providers: >> 1. Name: da2p1 >> Mediasize: 4000682172416 (3.7T) >> Sectorsize: 512 >> Stripesize: 0 >> Stripeoffset: 1048576 >> Mode: r1w1e1 >> rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634 >> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b >> label: (null) >> length: 4000682172416 >> offset: 1048576 >> type: freebsd-zfs >> index: 1 >> end: 7813834415 >> start: 2048 >> Consumers: >> 1. Name: da2 >> Mediasize: 4000787030016 (3.7T) >> Sectorsize: 512 >> Mode: r1w1e2 >>=20 >> Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using = 16 drives. I create the zpool with 4K sectors as follows: >> $ sudo gnop create -S 4096 /dev/da2p1 >> $ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 = da7p1 da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 = da17p1 >>=20 >> I confirm ashift=3D12: >> $ sudo zdb zav | grep ashift >> ashift: 12 >> ashift: 12 >>=20 >> "zpool list" approximately matches the expected raw capacity of = 16*4000682172416 =3D 64010914758656 bytes (58.28 TiB). >> $ zpool list zav >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> zav 58T 1.34M 58.0T 0% 1.00x ONLINE - >>=20 >> For raidz2, I'd expect to see 4000682172416*14 =3D 56009550413824 = bytes (50.94 TiB). However, I only get: >> $ zfs list zav >> NAME USED AVAIL REFER MOUNTPOINT >> zav 1.10M 46.8T 354K /zav >>=20 >> Or using df for greater accuracy: >> $ df zav >> Filesystem 1K-blocks Used Avail Capacity Mounted on >> zav 50288393472 354 50288393117 0% /zav >>=20 >> A total of 51495314915328 bytes (46.83TiB). (This is for a freshly = created zpool before any snapshots, etc. have been performed.) >>=20 >> I measure overhead as "expected - actual / expected", which in the = case of 4k sector (ashift=3D12) raidz2 comes to 8.05%. >>=20 >> To create a 512-byte sector (ashift=3D9) raidz2 pool, I basically = just replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm = ashift=3D9. zpool raw size is the same (as much as I can tell with such = limited precision from zpool list). However, the available size = according to zfs list/df is 54560512935936 bytes (49.62 TiB), which = amounts to an overhead of 2.58%. There are some minor differences in = ALLOC and USED size listings, so I repeat them here for the 512-byte = sector raidz2 pool: >> $ zpool list zav >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> zav 58T 228K 58.0T 0% 1.00x ONLINE - >> $ zfs list zav >> NAME USED AVAIL REFER MOUNTPOINT >> zav 198K 49.6T 73.0K /zav >> $ df zav >> Filesystem 1K-blocks Used Avail Capacity Mounted on >> zav 53281750914 73 53281750841 0% /zav >>=20 >> I expect some overhead from ZFS and according to this blog post: >> http://www.cuddletech.com/blog/pivot/entry.php?id=3D1013 >> (via = http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html) >> there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly = enough, when I create a pool with no raid/mirroring, I get an overhead = of 1.93% regardless of ashift=3D9 or ashift=3D12 which is quite close to = the 1/64 number. I have also tested raidz, which has similar behavior to = raidz2, however the overhead is slightly less in each case: 1) ashift=3D9 = raidz overhead is 2.33% and 2) ashift=3D12 raidz overhead is 7.04%. >>=20 >> In order to preserve space, I've put the zdb listings for both = ashift=3D9 and ashift=3D12 radiz2 pools here: >> http://pastebin.com/v2xjZkNw >>=20 >> There are also some differences in ZDB output, for example "SPA = allocated" is higher for in the 4K sector raidz2 pool, which seems = interesting, although I don't comprehend the significance of = this._______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>=20 >>=20 >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BA3EA647-9585-462E-AE40-DBDCFEF69743>