Date: Wed, 03 Apr 2013 12:26:06 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Matthias Andree <mandree@FreeBSD.org> Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Any objections/comments on axing out old ATA stack? Message-ID: <515BF5AE.4050804@FreeBSD.org> In-Reply-To: <515B25D8.7050902@FreeBSD.org> References: <51536306.5030907@FreeBSD.org> <20130331130409.GO3178@equilibrium.bsdes.net> <C699FE76-B456-49C7-8D3A-DD54F98DAFC1@samsco.org> <515B25D8.7050902@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 02.04.2013 21:39, Matthias Andree wrote: > Am 31.03.2013 23:02, schrieb Scott Long: > >> So what I hear you and Matthias saying, I believe, is that it should be easier to >> force disks to fall back to non-NCQ mode, and/or have a more responsive >> black-list for problematic controllers. Would this help the situation? It's hard to >> justify holding back overall forward progress because of some bad controllers; >> we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, >> enough to make up a sizable percentage of the internet's traffic, and we see no >> problems. How can we move forward but also take care of you guys with >> problematic hardware? > > Well, I am running the driver fine off of my WD Caviar RE3 disk, and the > problematic drive also works just fine with Windows and Linux, so it > must be something between the problematic drive and the FreeBSD driver. > > I would like to see any of this, in decreasing order of precedence: > > - debugged driver > > - assistance/instructions on helping how to debug the driver/trace NCQ > stuff/... (as in Jeremy Chadwick's followup in this same thread - this > helps, I will attempt to procure the required information; "back then", > reducing the number of tags to 31 was ineffective, including an error > message and getting a value of 32 when reading the setting back) Unfortunately, I don't know how to debug that. Command timeouts reported on the lists before are the kind of errors that are most difficult to diagnose since the controller gives no information to do that. We just see that sent commands are no longer completing. May be it is some incompatibility of specific drive and HBA firmwares, triggered by some innocent specifics of our ATA stack, GEOM or filesystems implementation. All I can propose is to try to identify such cases and add some quirks to workaround it, like disabling NCQ or limiting number of tags. I am not sure what else can we do about it without some controlled lab environment with affected hardware and SATA analyzer. > - "user-space" contingency features, such as letting camcontrol limit > the number of open NCQ tags, or disable NCQ, either on a per-drive basis I've merged support for that to 8/9-STABLE about 9 months ago: `camcontrol tags ada0 -v -N X` should change number of simultaneously used tags, `camcontrol negotiate ada0 -T (en|dis)able` should enable/disable use of NCQ. I just did some tests on HEAD and these commands seems like working. If you can reproduce the problem, it would be nice to collect information how these changes affect it. > I am capable of debugging C - mostly with gdb command-line, and > graphical Windows IDEs - but am unfamiliar with FreeBSD kernel > debugging. If necessary, I can pull up a second console, but the PC that > is affected is legacy-free, so serial port only works through a > serial/USB converter. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?515BF5AE.4050804>