From owner-freebsd-current@FreeBSD.ORG  Mon Dec 19 21:14:51 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B36F51065670;
	Mon, 19 Dec 2011 21:14:51 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 6BBE38FC12;
	Mon, 19 Dec 2011 21:14:51 +0000 (UTC)
Received: by obbwd18 with SMTP id wd18so2959428obb.13
	for <multiple recipients>; Mon, 19 Dec 2011 13:14:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=9t20RZ7VdM7HbT6m2wvypFO3PNnpyFwauSGfkYkpKlE=;
	b=aIh0KyDZ/jPzpiWxvyLQL+4X2Z2ydb/TRpbF8K7ceZGH7y2Qb7RP9v4l4O+i48ZL0c
	kS55aONaO2xpcWUu78FJJuQCgoUrH5jAGoWANTGxWst7hv8mnsx0KcCIQko13HcCYlrX
	RRiZo1JSYrSHbaplEkdeIst0BYxUmXg0pdYBs=
MIME-Version: 1.0
Received: by 10.182.73.42 with SMTP id i10mr11208633obv.76.1324329290959; Mon,
	19 Dec 2011 13:14:50 -0800 (PST)
Received: by 10.182.62.227 with HTTP; Mon, 19 Dec 2011 13:14:50 -0800 (PST)
In-Reply-To: <E858048F-66F6-4B32-A9E9-018CB5A830DC@digsys.bg>
References: <4EEF488E.1030904@freebsd.org>
	<83648C73-E45F-4ABA-8E83-4C8903A683AB@digsys.bg>
	<4EEFA5E4.9070803@freebsd.org>
	<E858048F-66F6-4B32-A9E9-018CB5A830DC@digsys.bg>
Date: Mon, 19 Dec 2011 13:14:50 -0800
Message-ID: <CAGH67wQH8MgbJ4d3MbJ5B_D9JFi8ZxL_kSCaQrF9iojZZO8jrg@mail.gmail.com>
From: Garrett Cooper <yanegomi@gmail.com>
To: Daniel Kalchev <daniel@digsys.bg>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: FreeBSD Current <freebsd-current@freebsd.org>
Subject: Re: Uneven load on drives in ZFS RAIDZ1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Dec 2011 21:14:51 -0000

On Mon, Dec 19, 2011 at 1:07 PM, Daniel Kalchev <daniel@digsys.bg> wrote:
>
> On Dec 19, 2011, at 11:00 PM, Stefan Esser wrote:
>
>> Am 19.12.2011 19:03, schrieb Daniel Kalchev:
>>> I have observed similar behavior, even more extreme on a spool with ded=
up enabled. Is dedup enabled on this spool?
>>
>> Thank you for the report!
>>
>> Well, I had dedup enabled for a few short tests. But since I have got
>> "only" 8GB of RAM and dedup seems to require an order of magnitude more
>> to be working well, I switched dedup off again after a few hours.
>
> You will need to get rid of the DDT, as those are read nevertheless even =
with dedup (already) disabled. The tables refer to already deduped data.
>
> In my case, I had about 2-3TB of deduced data, with 24GB RAM. There was n=
o shortage of RAM and I could not confirm that ARC is full.. but somehow th=
e pool was placing heavy read on one or two disks only (all others, nearly =
idle) -- apparently many small size reads.
>
> I resolved my issue by copying the data to a newly created filesystem in =
the same pool -- luckily there was enough space available, then removing th=
e 'deduped' filesystems.
>
> That last operation was particularly slow and at one time I had spontaneo=
us reboot -- the pool was 'impossible to mount', and as weird as it sounds,=
 I had 'out of swap space' killing the 'zpool list' process.
> I let it sit for few hours, until it has cleared itself.
>
> I/O in that pool is back to normal now.
>
> There is something terribly wrong with the dedup code.

Dedup in the ZFS manual claims that it needs 2GB of memory per TB of
data, but in reality it's closer to 5GB of memory per TB of data on
average. So if you turn it on on large datasets or pools and don't
limit the ARC, it ties your box in knots after it wires down all of
the physical memory (even when you're doing a reimport when it's
replaying the ZIL -- either on the array or on your dedicated ZIL
device). This of course either causes your machine to dig into swap
and slow to a crawl, and/or blows away your userland (and now you're
pretty much SoL).

Bottom line is that dedup is a poorly articulated feature and causes
lots of issues if enabled. Compression is a much better feature to
enable.

> Well, if your test data is not valuable, you can just delete it. :)

+1, but I suggest limiting the ARC first.

Cheers,
-Garrett