From owner-freebsd-current@FreeBSD.ORG  Mon Dec 19 21:00:22 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC18C106566B;
	Mon, 19 Dec 2011 21:00:22 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C99C8FC14;
	Mon, 19 Dec 2011 21:00:22 +0000 (UTC)
Received: by ghrr16 with SMTP id r16so666298ghr.13
	for <multiple recipients>; Mon, 19 Dec 2011 13:00:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	bh=D0iiDIy7FaONMN0V90hOEylzyTF9FhBpmlAk3Ll6iyI=;
	b=eglpe5F63MeBru5e5Busycmq+Y/My03TA9z8RPsm6sPTsze3jmuNsWY1fFu2BKzQI+
	4D2Y5cOh0jEsFprnicDdkDMhpZc45uSfAaan0PYZMJQBj+lmFpQK8qPJqL5Qefh+yQm+
	IaknPDugt/QijC8FUbUnBuV7B7UEmj4opMtwE=
Received: by 10.101.115.18 with SMTP id s18mr9451444anm.40.1324328421522;
	Mon, 19 Dec 2011 13:00:21 -0800 (PST)
Received: from kruse-111.3.ixsystems.com (drawbridge.ixsystems.com.
	[206.40.55.65])
	by mx.google.com with ESMTPS id i67sm31914220yhm.16.2011.12.19.13.00.20
	(version=TLSv1/SSLv3 cipher=OTHER);
	Mon, 19 Dec 2011 13:00:20 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Garrett Cooper <yanegomi@gmail.com>
In-Reply-To: <4EEFA472.5020509@freebsd.org>
Date: Mon, 19 Dec 2011 13:00:18 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <0AE9C160-81EC-4A41-985B-EF63E4B0CD9B@gmail.com>
References: <4EEF488E.1030904@freebsd.org>
	<CAGH67wQ6krJ=CRFt=Fb3TAikqKfCKekvtVnpQxkTPJgFcbMKsA@mail.gmail.com>
	<4EEFA472.5020509@freebsd.org>
To: Stefan Esser <se@freebsd.org>
X-Mailer: Apple Mail (2.1084)
Cc: FreeBSD Current <freebsd-current@freebsd.org>
Subject: Re: Uneven load on drives in ZFS RAIDZ1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Dec 2011 21:00:23 -0000

On Dec 19, 2011, at 12:54 PM, Stefan Esser wrote:

> Am 19.12.2011 18:05, schrieb Garrett Cooper:
>> On Mon, Dec 19, 2011 at 6:22 AM, Stefan Esser <se@freebsd.org> wrote:
>>> Hi ZFS users,
>>>=20
>>> for quite some time I have observed an uneven distribution of load
>>> between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt =
of
>>> a longer log of 10 second averages logged with gstat:
>>>=20
>>> dT: 10.001s  w: 10.000s  filter: ^a?da?.$
>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>   0    130    106   4134    4.5     23   1033    5.2   48.8| ada0
>>>   0    131    111   3784    4.2     19   1007    4.0   47.6| ada1
>>>   0     90     66   2219    4.5     24   1031    5.1   31.7| ada2
>>>   1     81     58   2007    4.6     22   1023    2.3   28.1| ada3
>>>=20
>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>   1    132    104   4036    4.2     27   1129    5.3   45.2| ada0
>>>   0    129    103   3679    4.5     26   1115    6.8   47.6| ada1
>>>   1     91     61   2133    4.6     30   1129    1.9   29.6| ada2
>>>   0     81     56   1985    4.8     24   1102    6.0   29.4| ada3
>>>=20
>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>   1    148    108   4084    5.3     39   2511    7.2   55.5| ada0
>>>   1    141    104   3693    5.1     36   2505   10.4   54.4| ada1
>>>   1    102     62   2112    5.6     39   2508    5.5   35.4| ada2
>>>   0     99     60   2064    6.0     39   2483    3.7   36.1| ada3
>>=20
>> This suggests (note that I said suggests) that there might be a =
slight
>> difference in the data path speeds or physical media as someone else
>> suggested; look at zpool iostat -v <interval> though before making a
>> firm statement as to whether or not a drive is truly not performing =
to
>> your assumed spec. gstat and zpool iostat -v suggest performance
>> though -- they aren't the end-all-be-all for determining drive
>> performance.
>=20
> I doubt there is a difference in the data path speeds, since all =
drives
> are connected to the SATA II ports of an Intel H67 chip.
>=20
> The drives seem to perform equally well, just with a ratio of read
> requests of 30% / 30% / 20% / 20% for ada0 .. ada3. But neither queue
> length nor command latencies indicate a problem or differences in the
> drives. It seems that a different number of commands is scheduled for =
2
> of the 4 drives, compared to the other 2, and that scheduling should =
be
> part of the ZFS code. I'm quite convinced, that neither the drives nor
> the other hardware plays a role, but I'll follow the suggestion to =
swap
> drives between controller ports and to observe whether the increased
> read load moves with the drives (indicating something on disk causes =
the
> anomaly) or stays with the SATA ports (indicating that lower numbered
> ports see higher load).
>=20
>> If the latency numbers were high enough, I would suggest dd'ing out =
to
>> the individual drives (i.e. remove the drive from the RAIDZ) to see =
if
>> there's a noticeable discrepancy, as this can indicate a bad cable,
>> backplane, or drive; from there I would start doing the physical swap
>> routine and see if the issue moves with the drive or stays static =
with
>> the controller channel and/or chassis slot.
>=20
> I do not expect a hardware problem, since command latencies are very
> similar over all drives, despite the higher read load on some of them.
> These are more busy by exactly the factor to be expected by only the
> higher command rate.
>=20
> But it seems that others do not observe the asymmetric distribution of
> requests, which makes me wonder whether I happen to have meta data
> arranged in such a way that it is always read from ada0 or ada1, but =
not
> (or rarely) from ada2 or ada3. That could explain it, including the =
fact
> that raidz1 over other numbers of drives 8e.g. 3 or 6) apparently show =
a
> much more symmetric distribution of read requests.

Basic question: does one set of drives vibrate differently than the =
other set?
-Garrett=