FreeBSD Mail Archives

Date:      Sat, 24 Mar 2012 11:38:50 -0700
From:      Taylor <j.freebsd-zfs@enone.net>
To:        Dennis Glatting <dg@pki2.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS extra space overhead for ashift=12 vs ashift=9 raidz2 pool?
Message-ID:  <BA3EA647-9585-462E-AE40-DBDCFEF69743@enone.net>
In-Reply-To: <alpine.BSF.2.00.1203230937580.89054@btw.pki2.com>
References:  <45654FDD-A20A-47C8-B3B5-F9B0B71CC38B@enone.net> <alpine.BSF.2.00.1203230937580.89054@btw.pki2.com>


Dennis,

This is a bit off topic from my original question and I'm hoping not to =
distract from it too much,
but to briefly answer your question:

My experience with 4TB Hitachi drives is limited; I've only had these =
drives for about a week. One of the
drives exhibited ICRC errors, which in theory could be just a cabling =
issue, but I couldn't reproduce
the problem with the same cable/slot and different drive, so I ended up =
RMAing the ICRC drive just
in case. However, I have had good luck with Hitachi 3TB drives over the =
past year, one Hitachi
4TB drive over the last month and have not encountered any other =
problems with this
batch of 4TB drives so far.

Cheers,

-Taylor


On Mar 23, 2012, at 9:40 AM, Dennis Glatting wrote:

>=20
> Somewhat related:
>=20
> I am also using 4TB Hitachi drives but only four. Although fairly =
happy with these drives I have had one disk fail in the two months I =
have been using them. This may have been an infant failure but I am =
wondering if you have had any similar experiances with the drives.
>=20
>=20
>=20
> On Fri, 23 Mar 2012, Taylor wrote:
>=20
>> Hello,
>>=20
>> I'm bringing up a new ZFS filesystem and have noticed something =
strange with respect to the overhead from ZFS. When I create a raidz2 =
pool with 512-byte sectors (ashift=3D9), I have an overhead of 2.59%, =
but when I create the zpool using 4k sectors (ashift=3D12), I have an =
overhead of 8.06%. This amounts to a difference of 2.79TiB in my =
particular application, which I'd like to avoid. :)
>>=20
>> (Assuming I haven't done anything wrong. :) ) Is the extra overhead =
for 4k sector (ashift=3D12) raidz2 pools expected? Is there any way to =
reduce this?
>>=20
>> (In my very limited performance testing, 4K sectors do seem to =
perform slightly better and more consistently, so I'd like to use them =
if I can avoid the extra overhead.)
>>=20
>> Details below.
>>=20
>> Thanks in advance for your time,
>>=20
>> -Taylor
>>=20
>>=20
>>=20
>> I'm running:
>> FreeBSD host 9.0-RELEASE FreeBSD 9.0-RELEASE #0  amd64
>>=20
>> I'm using Hitachi 4TB Deskstar 0S03364 drives, which are 4K sector =
devices.
>>=20
>> In order to "future proof" the raidz2 pool against possible =
variations in replacement drive size, I've created a single partition on =
each drive, starting at sector 2048 and using 100MB less than total =
available space on the disk.
>> $ sudo gpart list da2
>> Geom name: da2
>> modified: false
>> state: OK
>> fwheads: 255
>> fwsectors: 63
>> last: 7814037134
>> first: 34
>> entries: 128
>> scheme: GPT
>> Providers:
>> 1. Name: da2p1
>> Mediasize: 4000682172416 (3.7T)
>> Sectorsize: 512
>> Stripesize: 0
>> Stripeoffset: 1048576
>> Mode: r1w1e1
>> rawuuid: 71ebbd49-7241-11e1-b2dd-00259055e634
>> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
>> label: (null)
>> length: 4000682172416
>> offset: 1048576
>> type: freebsd-zfs
>> index: 1
>> end: 7813834415
>> start: 2048
>> Consumers:
>> 1. Name: da2
>> Mediasize: 4000787030016 (3.7T)
>> Sectorsize: 512
>> Mode: r1w1e2
>>=20
>> Each partition gives me 4000682172416 bytes (or 3.64 TiB). I'm using =
16 drives.  I create the zpool with 4K sectors as follows:
>> $ sudo gnop create -S 4096 /dev/da2p1
>> $ sudo zpool create zav raidz2 da2p1.nop da3p1 da4p1 da5p1 da6p1 =
da7p1 da8p1 da9p1 da10p1 da11p1 da12p1 da13p1 da14p1 da15p1 da16p1 =
da17p1
>>=20
>> I confirm ashift=3D12:
>> $ sudo zdb zav | grep ashift
>>              ashift: 12
>>              ashift: 12
>>=20
>> "zpool list" approximately matches the expected raw capacity of =
16*4000682172416 =3D 64010914758656 bytes (58.28 TiB).
>> $ zpool list zav
>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> zav     58T  1.34M  58.0T     0%  1.00x  ONLINE  -
>>=20
>> For raidz2, I'd expect to see 4000682172416*14 =3D 56009550413824 =
bytes (50.94 TiB). However, I only get:
>> $ zfs list zav
>> NAME   USED  AVAIL  REFER  MOUNTPOINT
>> zav   1.10M  46.8T  354K  /zav
>>=20
>> Or using df for greater accuracy:
>> $ df zav
>> Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
>> zav        50288393472  354 50288393117     0%    /zav
>>=20
>> A total of 51495314915328 bytes (46.83TiB). (This is for a freshly =
created zpool before any snapshots, etc. have been performed.)
>>=20
>> I measure overhead as "expected - actual / expected", which in the =
case of 4k sector (ashift=3D12) raidz2 comes to 8.05%.
>>=20
>> To create a 512-byte sector (ashift=3D9) raidz2 pool, I basically =
just replace "da2p1.nop" with "da2p1" when creating the zpool. I confirm =
ashift=3D9. zpool raw size is the same (as much as I can tell with such =
limited precision from zpool list). However, the available size =
according to zfs list/df is 54560512935936 bytes (49.62 TiB), which =
amounts to an overhead of 2.58%. There are some minor differences in =
ALLOC and USED size listings, so I repeat them here for the 512-byte =
sector raidz2 pool:
>> $ zpool list zav
>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> zav     58T   228K  58.0T     0%  1.00x  ONLINE  -
>> $ zfs list zav
>> NAME   USED  AVAIL  REFER  MOUNTPOINT
>> zav    198K  49.6T  73.0K  /zav
>> $ df zav
>> Filesystem 1K-blocks   Used       Avail Capacity  Mounted on
>> zav        53281750914   73 53281750841     0%    /zav
>>=20
>> I expect some overhead from ZFS and according to this blog post:
>> http://www.cuddletech.com/blog/pivot/entry.php?id=3D1013
>> (via =
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/041773.html)
>> there may be a 1/64 or 1.56% overhead baked into ZFS. Interestingly =
enough, when I create a pool with no raid/mirroring, I get an overhead =
of 1.93% regardless of ashift=3D9 or ashift=3D12 which is quite close to =
the 1/64 number. I have also tested raidz, which has similar behavior to =
raidz2, however the overhead is slightly less in each case: 1) ashift=3D9 =
raidz overhead is 2.33% and 2) ashift=3D12 raidz overhead is 7.04%.
>>=20
>> In order to preserve space, I've put the zdb listings for both =
ashift=3D9 and ashift=3D12 radiz2 pools here:
>> http://pastebin.com/v2xjZkNw
>>=20
>> There are also some differences in ZDB output, for example "SPA =
allocated" is higher for in the 4K sector raidz2 pool, which seems =
interesting, although I don't comprehend the significance of =
this._______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>=20
>>=20
>=20

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BA3EA647-9585-462E-AE40-DBDCFEF69743>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation