Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Dec 2015 15:56:53 -0600
From:      Alan Amesbury <amesbury@oitsec.umn.edu>
To:        hackers@freebsd.org
Subject:   Re: The minimum amount of memory needed to use ZFS.
Message-ID:  <411CC5EB-012B-43FC-B7E0-5D09D3CA3E55@oitsec.umn.edu>
In-Reply-To: <26557C02-C591-4232-BBD0-988B0EB89575@gid.co.uk>
References:  <CA%2BxzKjDQ_vUfgz4LvvcBE950=-ww7ukCbFmZz1vnzhGrNCucbQ@mail.gmail.com> <20151223121445.GA85016@ozzmosis.com> <26557C02-C591-4232-BBD0-988B0EB89575@gid.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 23, 2015, at 11:53 , Bob Bishop <rb@gid.co.uk> wrote:

[snip]
> Deduplication seems like a very bad idea unless you have both a lot of =
duplicated data and a serious shortage of disk. It needs a lot of RAM, =
increasing over time. Depending on the hardware and the use case, =
compression (which effectively only costs CPU) might be a better option.

Agreed:  Deduplication isn't something you want to enable until you're =
sure you have a workload that's suitable for it.  Memory usage increases =
on Freebsd to an estimated 2-5GB per terabyte of zpool[1].  Oracle has =
published[2] some information on deduplication in ZFS, too, which =
parallels information in the FreeBSD wiki, namely the use of 'zdb' to =
analyze your data to determine if deduplication is even worthwhile.  =
Note this can take a while to run and, at least for me, had issues =
running on at least one of my hosts.

Output is pretty straightforward.  For example:


# zdb -S pool
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    4.94M    578G    577G    579G    4.94M    578G    577G    579G
     2     416K   50.5G   50.5G   50.5G     922K    112G    112G    112G
     4    39.6K   4.89G   4.89G   4.89G     175K   21.6G   21.6G   21.6G
     8    3.06K    382M    381M    382M    31.6K   3.85G   3.84G   3.85G
    16      306   34.4M   33.3M   33.4M    5.81K    665M    639M    641M
    32       62   6.13M   4.99M   5.04M    2.77K    281M    230M    232M
    64       41   4.88M   4.88M   4.88M    3.56K    432M    432M    433M
   128       25   3.12M   3.12M   3.12M    4.37K    560M    560M    560M
   256       71   8.88M   8.88M   8.88M    20.4K   2.56G   2.56G   2.56G
   512        2    256K    256K    256K    1.27K    163M    163M    163M
    2K        2    256K    256K    256K    4.19K    536M    536M    536M
  128K        1    128K    128K    128K     148K   18.4G   18.4G   18.4G
 Total    5.39M    634G    633G    634G    6.23M    739G    739G    740G

dedup =3D 1.17, compress =3D 1.00, copies =3D 1.00, dedup * compress / =
copies =3D 1.17



For this host there's some evidence that deduplication might buy me a =
small amount of additional space, but I'd rather allocate RAM to the ARC =
for performance instead of using it for what looks like a small =
reduction in space usage.  For my workloads, I tend to get a much bigger =
boost from using compression, as modern CPUs can typically compress =
pretty close to the speed of rotational media.  (SSDs would be a =
different story.)  Example 'zdb -S' output from a host using =
compression:


# zdb -S pool
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    8.25M   1008G   80.6G   80.6G    8.25M   1008G   80.6G   80.6G
     2      697   76.4M   21.3M   21.3M    1.46K    160M   44.8M   44.8M
     4    1.05K   10.2M   3.34M   3.34M    5.15K   48.6M   15.9M   15.9M
     8       65   1.09M    318K    318K      649   10.8M   3.06M   3.06M
    16       23    904K    300K    300K      558   20.1M   6.55M   6.55M
    32       18   1.78M    681K    681K      770   74.2M   27.7M   27.7M
    64       29   3.27M   1.23M   1.23M    2.61K    305M    115M    115M
   128       15   1.41M    536K    536K    2.38K    209M   77.3M   77.3M
 Total    8.25M   1008G   80.6G   80.6G    8.26M   1009G   80.9G   80.9G

dedup =3D 1.00, compress =3D 12.47, copies =3D 1.00, dedup * compress / =
copies =3D 12.51



The data, primarily textual log files of some kind, compresses pretty =
well.


--=20
Alan Amesbury
University Information Security
http://umn.edu/lookup/amesbury

[1] - https://wiki.freebsd.org/ZFSTuningGuide#Deduplication
[2] - =
http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-s=
ize-zfs-dedup-1354231.html=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?411CC5EB-012B-43FC-B7E0-5D09D3CA3E55>