From owner-freebsd-fs@FreeBSD.ORG  Sat Nov 17 22:59:00 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 04DE03B5
 for <freebsd-fs@freebsd.org>; Sat, 17 Nov 2012 22:59:00 +0000 (UTC)
 (envelope-from mcdouga9@egr.msu.edu)
Received: from mail.egr.msu.edu (hill.egr.msu.edu [35.9.37.162])
 by mx1.freebsd.org (Postfix) with ESMTP id BF10B8FC0C
 for <freebsd-fs@freebsd.org>; Sat, 17 Nov 2012 22:58:59 +0000 (UTC)
Received: from hill (localhost [127.0.0.1])
 by mail.egr.msu.edu (Postfix) with ESMTP id C80072FB12;
 Sat, 17 Nov 2012 17:58:52 -0500 (EST)
X-Virus-Scanned: amavisd-new at egr.msu.edu
Received: from mail.egr.msu.edu ([127.0.0.1])
 by hill (hill.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id z_igZraJvWMh; Sat, 17 Nov 2012 17:58:52 -0500 (EST)
Received: from daemon.localdomain (daemon.egr.msu.edu [35.9.44.65])
 by mail.egr.msu.edu (Postfix) with ESMTP id A7A882FB0B;
 Sat, 17 Nov 2012 17:58:51 -0500 (EST)
Received: by daemon.localdomain (Postfix, from userid 21281)
 id 9A1FB1815F; Sat, 17 Nov 2012 17:58:51 -0500 (EST)
Date: Sat, 17 Nov 2012 17:58:51 -0500
From: Adam McDougall <mcdouga9@egr.msu.edu>
To: kpneal@pobox.com
Subject: Re: SSD recommendations for ZFS cache/log
Message-ID: <20121117225851.GJ1462@egr.msu.edu>
References: <CAFHbX1K-NPuAy5tW0N8=sJD=CU0Q1Pm3ZDkVkE+djpCsD1U8_Q@mail.gmail.com>
 <57ac1f$gf3rkl@ipmail05.adl6.internode.on.net>
 <50A31D48.3000700@shatow.net>
 <CAF6rxgkh6C0LKXOZa264yZcA3AvQdw7zVAzWKpytfh0+KnLOJg@mail.gmail.com>
 <20121116044055.GA47859@neutralgood.org>
 <CACpH0MfQWokFZkh58qm+2_tLeSby9BWEuGjkH15Nu3+S1+p3SQ@mail.gmail.com>
 <50A64694.5030001@egr.msu.edu>
 <20121117181803.GA26421@neutralgood.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20121117181803.GA26421@neutralgood.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 17 Nov 2012 22:59:00 -0000

On Sat, Nov 17, 2012 at 01:18:03PM -0500, kpneal@pobox.com wrote:

  On Fri, Nov 16, 2012 at 08:58:44AM -0500, Adam McDougall wrote:
  > On 11/16/12 00:41, Zaphod Beeblebrox wrote:
  > > On Thu, Nov 15, 2012 at 11:40 PM,  <kpneal@pobox.com> wrote:
  > >>> +       <answer>
  > >>> +         <para>The answer very much depends on the expected workload.
  > >>> +           Deduplication takes up a signifigent amount of RAM and CPU
  > >>> +           time and may slow down read and write disk access times.
  > >>> +           Unless one is storing data that is very heavily
  > >>> +           duplicated (such as virtual machine images, or user
  > >>> +           backups) it is likely that deduplication will do more harm
  > >>> +           than good.  Another consideration is the inability to
  > >>
  > >> I advise against advice that is this firm. The statement that it will "do
  > >> more harm than good" really should be omitted. And I'm not sure it is
  > >> fair to say it takes a bunch of CPU. Lots of memory, yes, but lots of
  > >> CPU isn't so clear.
  > >
  > > I experimented by enabling DEDUP on a RAID-Z1 pool containing 4x 2T
  > > green drives.  The system had 8G of RAM and was otherwise quiet.  I
  > > copied a dataset of about 1T of random stuff onto the array and then
  > > copied the same set of data onto the array a second time. The end
  > > result is a dedup ration of almost 2.0 and only around 1T of disk
  > > used.
  > >
  > > As I recall (and it's been 6-ish months since I did this), the 2nd
  > > write became largely CPU bound with little disk activity.  As far as I
  > > could tell, the dedup table never thrashed on the disk ... and that
  > > most of the disk activity seemed to be creating the directory tree or
  > > reading the disk to do the verify step of dedup.
  
  Well, yes, it was CPU bound because it wasn't disk bound. All filesystem
  activity is going to be either disk bound, CPU bound, or waiting for more
  filesystem requests (eg, network bound or similar).
  
  Also note that the original text above said that dedup only made sense
  with heavily duplicated data. That's exactly the case you tested. So
  your test says nothing about the case where there isn't much duplicated
  data. The phrase I advised against was referring to the case you didn't
  test.
  
  > Now try deleting some data and the fun begins :)
  
  You've had a bad experience? I'd love to hear about it.
  -- 
  Kevin P. Neal                                http://www.pobox.com/~kpn/
  "Oh, I've heard that paradox a couple of times, but there's something
  about a cat dying and I hate to think of such things."
    - Dr. Donald Knuth speaking of Schrodinger's cat, December 8, 1999, MIT

Deleting data takes significantly longer than usual because it has to
un-dedupe the data, which takes longer than most people expect, and
ties up the removal process until it is done.  During that time,
the CPU is pegged pretty hard and the disks are active but not doing
much.  I haven't had the opportunity to try this with a large memory
system or one with snappy l2arc to see if it is better.

This can spiral in at least two ways.  For one, the average system admin
will not expect it to take so long to delete files and think something
is wrong.  If this happens in small amounts, they may decide to disable
dedupe if they realize that is the cause.  But, since the data is already
deduped, they are stuck with that behavior until the data is copied fresh
or deleted.  Doing THAT can take an enormous amount of time, progressing
at a slow pace, and has a chance of leading to a deadlock (not making this
up).  If a deadlock occurs while they are trying to solve this issue,
tempers flare even further, especially since the next reboot will continue
thrashing the disks where it left off but perhaps before the admin has a
chance to log in and figure out what is happening, which isn't obvious.
Worse yet, if a lot of data has been deleted, another deadlock may occur.
Rinse, Repeat, swear at ZFS, perhaps vow that dedupe is "not ready" and
a quiet threat.  There have been several people on the FreeBSD mailing lists
that have had these symptoms.  Some of them added ram to get past it.
Some found a way to measure progress and kept letting it churn/deadlock/reboot
until things came back to normal.  I think in -current there is a new zfs
feature allowing for background deletion that may ease this issue, and
someone reported success.