From owner-freebsd-stable@FreeBSD.ORG  Wed May 29 15:16:21 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 312EDA6A
 for <freebsd-stable@FreeBSD.org>; Wed, 29 May 2013 15:16:21 +0000 (UTC)
 (envelope-from ian@FreeBSD.org)
Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 0C89ADAB
 for <freebsd-stable@FreeBSD.org>; Wed, 29 May 2013 15:16:20 +0000 (UTC)
Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52]
 helo=damnhippie.dyndns.org)
 by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1Uhi6u-000Gk7-65; Wed, 29 May 2013 15:16:20 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r4TFGHDA007633;
 Wed, 29 May 2013 09:16:17 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 24.8.230.52
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX18NkzI6uh0bYWEts4wr9GuL
Subject: Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
From: Ian Lepore <ian@FreeBSD.org>
To: Oliver Fromme <olli@lurza.secnetix.de>
In-Reply-To: <201305291421.r4TELY8p042536@grabthar.secnetix.de>
References: <201305291421.r4TELY8p042536@grabthar.secnetix.de>
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 29 May 2013 09:16:17 -0600
Message-ID: <1369840577.1258.45.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: killing@multiplay.co.uk, freebsd-stable@FreeBSD.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 May 2013 15:16:21 -0000

On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote:
> Steven Hartland wrote:
>  > Have you checked your sata cables and psu outputs?
>  > 
>  > Both of these could be the underlying cause of poor signalling.
> 
> I can't easily check that because it is a cheap rented
> server in a remote location.
> 
> But I don't believe it is bad cabling or PSU anyway, or
> otherwise the problem would occur intermittently all the
> time if the load on the disks is sufficiently high.
> But it only occurs at tags=3 and above.  At tags=2 it does
> not occur at all, no matter how hard I hammer on the disks.
> 
> At the moment I'm inclined to believe that it is either
> a bug in the HDD firmware or in the controller.  The disks
> aren't exactly new, they're 400 GB Samsung ones that are
> several years old.  I think it's not uncommon to have bugs
> in the NCQ implementation in such disks.
> 
> The only thing that puzzles me is the fact that the problem
> also disappears completely when I reduce the SATA rev from
> II to I, even at tags=32.
> 

It seems to me that you dismiss signaling problems too quickly.
Consider the possibilities... A bad cable leads to intermittant errors
at higher speeds.  When NCQ is disabled or limited the software handles
these errors pretty much transparently.  When NCQ is not limitted and
there are many outstanding requests, suddenly the error handling in the
software breaks down somehow and a minor recoverable problem becomes an
in-your-face error.

I'm not saying any of the foregoing is true, just that you should
consider the possibility that you're dealing with multiple problems
which are only loosely coupled, but together can seem like a single more
serious problem.  You don't know enough yet to casually dismiss
anything.

-- Ian