From owner-freebsd-current@FreeBSD.ORG Mon Dec 19 21:14:51 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B36F51065670; Mon, 19 Dec 2011 21:14:51 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6BBE38FC12; Mon, 19 Dec 2011 21:14:51 +0000 (UTC) Received: by obbwd18 with SMTP id wd18so2959428obb.13 for ; Mon, 19 Dec 2011 13:14:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=9t20RZ7VdM7HbT6m2wvypFO3PNnpyFwauSGfkYkpKlE=; b=aIh0KyDZ/jPzpiWxvyLQL+4X2Z2ydb/TRpbF8K7ceZGH7y2Qb7RP9v4l4O+i48ZL0c kS55aONaO2xpcWUu78FJJuQCgoUrH5jAGoWANTGxWst7hv8mnsx0KcCIQko13HcCYlrX RRiZo1JSYrSHbaplEkdeIst0BYxUmXg0pdYBs= MIME-Version: 1.0 Received: by 10.182.73.42 with SMTP id i10mr11208633obv.76.1324329290959; Mon, 19 Dec 2011 13:14:50 -0800 (PST) Received: by 10.182.62.227 with HTTP; Mon, 19 Dec 2011 13:14:50 -0800 (PST) In-Reply-To: References: <4EEF488E.1030904@freebsd.org> <83648C73-E45F-4ABA-8E83-4C8903A683AB@digsys.bg> <4EEFA5E4.9070803@freebsd.org> Date: Mon, 19 Dec 2011 13:14:50 -0800 Message-ID: From: Garrett Cooper To: Daniel Kalchev Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Current Subject: Re: Uneven load on drives in ZFS RAIDZ1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 21:14:51 -0000 On Mon, Dec 19, 2011 at 1:07 PM, Daniel Kalchev wrote: > > On Dec 19, 2011, at 11:00 PM, Stefan Esser wrote: > >> Am 19.12.2011 19:03, schrieb Daniel Kalchev: >>> I have observed similar behavior, even more extreme on a spool with ded= up enabled. Is dedup enabled on this spool? >> >> Thank you for the report! >> >> Well, I had dedup enabled for a few short tests. But since I have got >> "only" 8GB of RAM and dedup seems to require an order of magnitude more >> to be working well, I switched dedup off again after a few hours. > > You will need to get rid of the DDT, as those are read nevertheless even = with dedup (already) disabled. The tables refer to already deduped data. > > In my case, I had about 2-3TB of deduced data, with 24GB RAM. There was n= o shortage of RAM and I could not confirm that ARC is full.. but somehow th= e pool was placing heavy read on one or two disks only (all others, nearly = idle) -- apparently many small size reads. > > I resolved my issue by copying the data to a newly created filesystem in = the same pool -- luckily there was enough space available, then removing th= e 'deduped' filesystems. > > That last operation was particularly slow and at one time I had spontaneo= us reboot -- the pool was 'impossible to mount', and as weird as it sounds,= I had 'out of swap space' killing the 'zpool list' process. > I let it sit for few hours, until it has cleared itself. > > I/O in that pool is back to normal now. > > There is something terribly wrong with the dedup code. Dedup in the ZFS manual claims that it needs 2GB of memory per TB of data, but in reality it's closer to 5GB of memory per TB of data on average. So if you turn it on on large datasets or pools and don't limit the ARC, it ties your box in knots after it wires down all of the physical memory (even when you're doing a reimport when it's replaying the ZIL -- either on the array or on your dedicated ZIL device). This of course either causes your machine to dig into swap and slow to a crawl, and/or blows away your userland (and now you're pretty much SoL). Bottom line is that dedup is a poorly articulated feature and causes lots of issues if enabled. Compression is a much better feature to enable. > Well, if your test data is not valuable, you can just delete it. :) +1, but I suggest limiting the ARC first. Cheers, -Garrett