From owner-freebsd-stable@FreeBSD.ORG  Fri Feb 20 11:53:13 2015
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BCA30EAB
 for <stable@freebsd.org>; Fri, 20 Feb 2015 11:53:13 +0000 (UTC)
Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4914BDF9
 for <stable@freebsd.org>; Fri, 20 Feb 2015 11:53:12 +0000 (UTC)
Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com
 [80.46.130.69])
 by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t1KBrBea048918;
 Fri, 20 Feb 2015 11:53:11 GMT (envelope-from rb@gid.co.uk)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\))
Subject: Re: ada drives keep timing out!
From: Bob Bishop <rb@gid.co.uk>
In-Reply-To: <1BFD18FF-D3A7-4230-9425-CEDC7DC94181@tao.org.uk>
Date: Fri, 20 Feb 2015 11:53:06 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <D0A51FC9-3E68-4EA9-A9D2-5B32513017EB@gid.co.uk>
References: <064CF905-DF19-40A7-8CE8-D9FFE17913EE@tao.org.uk>
 <938BC03E-2C47-4381-9D21-5849E9E6E126@gid.co.uk>
 <1BFD18FF-D3A7-4230-9425-CEDC7DC94181@tao.org.uk>
To: Dr Josef Karthauser <joe@tao.org.uk>
X-Mailer: Apple Mail (2.2070.6)
Cc: stable@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Feb 2015 11:53:13 -0000

Hi,

> On 20 Feb 2015, at 11:21, Dr Josef Karthauser <joe@tao.org.uk> wrote:
>=20
>=20
>> On 20 Feb 2015, at 11:14, Bob Bishop <rb@gid.co.uk> wrote:
>>=20
>> Hi,
>>=20
>>> On 20 Feb 2015, at 10:34, Dr Josef Karthauser <joe@tao.org.uk> =
wrote:
>>>=20
>>> Hi there,
>>>=20
>>> I reported this last year, but I=E2=80=99d like to revisit it as it =
must have a software remedy. I know that I=E2=80=99m not the only one to =
have reported the problem.
>>>=20
>>> I have a ZFS pool with a number of western digital drives in it (WDC =
WD1000FYPS-01ZKB0 02.01B01).
>>=20
>> WD Green Power drives. I've had similar problems, sometimes they take =
a looooong time to come ready; the controller times out waiting for =
drive ready and the rest you know.
>>=20
>> Depending on the controller there may be nothing you can do. Maybe =
it's possible to turn off the drives's green features. I replaced the =
drives with something that works.=20
>>=20
>=20
> Hi Bob,
>=20
> In this case other report no such issues on the same hardware running =
Linux, so it seems to me to be an internal timeout issue within FreeBSD, =
or maybe Linux retries more than FreeBSD and so doesn=E2=80=99t =
experience the issue that we do. It=E2=80=99s clear that FreeBSD is =
proactively disconnecting the drive when the timeout occurs. So, maybe =
there=E2=80=99s something that can be done there.

It depends on the disk controller hardware/firmware (not the drive =
electronics), not necessarily the driver. =46rom the drive datasheet =
http://www.wdc.com/en/library/sata/2879-701236.pdf?wdc_lang=3Den

"WD GreenPower drives monitor work load and automatically invoke idle =
mode whenever possible to further reduce unnecessary power consumption. =
Drive recovery time from idle mode is less than one second, [etc]"

What happens is that the drive decides to idle and the controller has no =
way of telling. So when the controller issues its next command, the =
drive takes the best part of a second (if you are unlucky) to come out =
of idle, execute the command and return to ready. The controller times =
this out at a very low level and assumes that the drive has gone away.

Maybe Linux does the equivalent of a bus rescan to work around this. I =
had the problem using a dumb RAID 1 controller, once a drive dropped out =
the mirror was broken so it was a big deal.

> Joe

--
Bob Bishop
rb@gid.co.uk