From owner-freebsd-fs@FreeBSD.ORG  Mon Jan 14 19:13:42 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 48135DA5
 for <freebsd-fs@freebsd.org>; Mon, 14 Jan 2013 19:13:42 +0000 (UTC)
 (envelope-from artemb@gmail.com)
Received: from mail-vb0-f47.google.com (mail-vb0-f47.google.com
 [209.85.212.47]) by mx1.freebsd.org (Postfix) with ESMTP id F04751C9
 for <freebsd-fs@freebsd.org>; Mon, 14 Jan 2013 19:13:41 +0000 (UTC)
Received: by mail-vb0-f47.google.com with SMTP id e21so3912475vbm.34
 for <freebsd-fs@freebsd.org>; Mon, 14 Jan 2013 11:13:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=pZo+nyGb8Coy3KWiu9wDGHcznmwlSyv8uWLJMzxP8mo=;
 b=mq5NNJ+pWNc2HMqdufIE0bqP3uMKcLUCoBhDsyVVTRtmGDAGGj2ru4eE7C1unorfRk
 KX4G1HuZPMqgtpPW79SfU0VEqxUpZgN9Dc1nTkLGh2cb8KcfhsKf90v5wA6Z7WvtOYSY
 VKJ1E1j8dKVHLuUIkEZyV0WHiS1LqqHlsXJ8jEBa9L1ivNVWIfvH4wMpLKeG08ZOeg8D
 Wi+qiRoqpKTuJk/DtXQSD0fCDiInQc4w2Vqy8GR51TmBltQVOPu3onf/1KZxRnSnVXUb
 A8lmkUUTSp2gzCi1pba/KKWHav6IQAzZ7LIWfVMZwK0zwOeQaC9VAiErATYYgmO+blNV
 BKxw==
MIME-Version: 1.0
Received: by 10.52.180.200 with SMTP id dq8mr89384491vdc.71.1358190820894;
 Mon, 14 Jan 2013 11:13:40 -0800 (PST)
Sender: artemb@gmail.com
Received: by 10.220.122.196 with HTTP; Mon, 14 Jan 2013 11:13:40 -0800 (PST)
In-Reply-To: <20130114094010.GA75529@mid.pc5.i.0x5.de>
References: <20130108174225.GA17260@mid.pc5.i.0x5.de>
 <CAFqOu6jgA8RWV5d+rOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com>
 <20130109162613.GA34276@mid.pc5.i.0x5.de>
 <CAFqOu6jrng=v8eVyhqV-PBqJM_dYy+U7X4+=ahBeoxvK4mxcSA@mail.gmail.com>
 <20130110193949.GA10023@mid.pc5.i.0x5.de>
 <20130111073417.GA95100@mid.pc5.i.0x5.de>
 <CAFqOu6gWpMsWN0pTBiv10WfwyGWMfO9GzMLWTtcVxHixr-_i3Q@mail.gmail.com>
 <20130114094010.GA75529@mid.pc5.i.0x5.de>
Date: Mon, 14 Jan 2013 11:13:40 -0800
X-Google-Sender-Auth: wj3keMDjo9kBGkdBzj1W7RwB6V0
Message-ID: <CAFqOu6hxfGt_M6Jo9qWeifDz9YnNc_Bd9H-GEe4RYtutaPvH5w@mail.gmail.com>
Subject: Re: slowdown of zfs (tx->tx)
From: Artem Belevich <art@freebsd.org>
To: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jan 2013 19:13:42 -0000

On Mon, Jan 14, 2013 at 1:40 AM, Nicolas Rachinsky
<fbsd-mas-0@ml.turing-complete.org> wrote:
>   5 Reallocated_Sector_Ct   0x0033   094   094   010    Pre-fail  Always       -       166
> 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1259614646
> 196 Reallocated_Event_Count 0x0032   096   096   000    Old_age   Always       -       166

> Reallocated_Sector_Ct did not increase during the last days.

It does not matter IMHO. That hard drive already got quite a few bad
sectors that ECC could not deal with. There are apparently more
marginally bad sectors, but ECC deals with it for now. Once enough
bits rot, you'll get more bad sectors. I personally would replace the
drive.

>> Cound you do gstat with 1-second interval. Some of the 5-second
>> samples show that ada8 is the bottleneck -- it has its request queue
>> full (L(q)=10) when all other drives were done with their jobs. And
>> that's a 5-sec average. Its write service time also seems to be a lot
>> higher than for other drives.
>
> Attached.  I have replace ada8 by ada9, which is a Western Digital
> Caviar Black.
>
> Now ada0 and ada4 seem to be the bottleneck.
>
> But I don't understand the intervalls without any disk activity.

It is puzzling. Is rsync still sleeping in tx->tx state? Try running
"procstat -kk <rsync-PID>" periodically. It will print in-kernel stack
trace and may help giving a clue where/why rsync is stuck.

--Artem