Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Apr 2013 12:26:06 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Matthias Andree <mandree@FreeBSD.org>
Cc:        freebsd-current@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Any objections/comments on axing out old ATA stack?
Message-ID:  <515BF5AE.4050804@FreeBSD.org>
In-Reply-To: <515B25D8.7050902@FreeBSD.org>
References:  <51536306.5030907@FreeBSD.org> <20130331130409.GO3178@equilibrium.bsdes.net> <C699FE76-B456-49C7-8D3A-DD54F98DAFC1@samsco.org> <515B25D8.7050902@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 02.04.2013 21:39, Matthias Andree wrote:
> Am 31.03.2013 23:02, schrieb Scott Long:
>
>> So what I hear you and Matthias saying, I believe, is that it should be easier to
>> force disks to fall back to non-NCQ mode, and/or have a more responsive
>> black-list for problematic controllers.  Would this help the situation?  It's hard to
>> justify holding back overall forward progress because of some bad controllers;
>> we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x,
>> enough to make up a sizable percentage of the internet's traffic, and we see no
>> problems.  How can we move forward but also take care of you guys with
>> problematic hardware?
>
> Well, I am running the driver fine off of my WD Caviar RE3 disk, and the
> problematic drive also works just fine with Windows and Linux, so it
> must be something between the problematic drive and the FreeBSD driver.
>
> I would like to see any of this, in decreasing order of precedence:
>
> - debugged driver
>
> - assistance/instructions on helping how to debug the driver/trace NCQ
> stuff/...  (as in Jeremy Chadwick's followup in this same thread - this
> helps, I will attempt to procure the required information; "back then",
> reducing the number of tags to 31 was ineffective, including an error
> message and getting a value of 32 when reading the setting back)

Unfortunately, I don't know how to debug that. Command timeouts reported 
on the lists before are the kind of errors that are most difficult to 
diagnose since the controller gives no information to do that. We just 
see that sent commands are no longer completing. May be it is some 
incompatibility of specific drive and HBA firmwares, triggered by some 
innocent specifics of our ATA stack, GEOM or filesystems implementation. 
All I can propose is to try to identify such cases and add some quirks 
to workaround it, like disabling NCQ or limiting number of tags. I am 
not sure what else can we do about it without some controlled lab 
environment with affected hardware and SATA analyzer.

> - "user-space" contingency features, such as letting camcontrol limit
> the number of open NCQ tags, or disable NCQ, either on a per-drive basis

I've merged support for that to 8/9-STABLE about 9 months ago:
`camcontrol tags ada0 -v -N X` should change number of simultaneously 
used tags,
`camcontrol negotiate ada0 -T (en|dis)able` should enable/disable use of 
NCQ.
I just did some tests on HEAD and these commands seems like working. If 
you can reproduce the problem, it would be nice to collect information 
how these changes affect it.

> I am capable of debugging C - mostly with gdb command-line, and
> graphical Windows IDEs - but am unfamiliar with FreeBSD kernel
> debugging. If necessary, I can pull up a second console, but the PC that
> is affected is legacy-free, so serial port only works through a
> serial/USB converter.


-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?515BF5AE.4050804>