From owner-freebsd-stable@FreeBSD.ORG Wed Apr 3 09:26:13 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 75C95ACE; Wed, 3 Apr 2013 09:26:13 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bk0-x22f.google.com (mail-bk0-x22f.google.com [IPv6:2a00:1450:4008:c01::22f]) by mx1.freebsd.org (Postfix) with ESMTP id A83F3F3D; Wed, 3 Apr 2013 09:26:12 +0000 (UTC) Received: by mail-bk0-f47.google.com with SMTP id ik5so674623bkc.34 for ; Wed, 03 Apr 2013 02:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=jaEBGAJH3UakwCfUj8P0wDHH/j7krKpd3n7CgCXtK3M=; b=uZD8Yj0cZ9rmGppdRz0SMkJxiFY6xu5M8xpJw9fY7RQvQLYcGla+sF2VeQKGO0/J6x zCpvSUYIDnbJx0/5LtY4TcxJlPRzoQVurOYmc0WzhxXJ3VN8sSSSPosNhQiJ5QVxIo7s delOpq19Gbf28JxOkMuUvTxLWwf24gOTqQ/ohZj5fq9cthZQqo/zyJI0GzgCm8OCdimw ShTmpgpXDBw3lF7AwnLw8lUs0Ec39fjAJG2f6J1N0+k9StotyPWdQHm3paG2TmsSSWT1 9FEikEZWF7S+jEr/FUYKb65vfoCuhiKcCYCMRgh1row0khULh9qIOuqXmxKwgZorI0aR mdoQ== X-Received: by 10.205.103.67 with SMTP id dh3mr670862bkc.19.1364981171605; Wed, 03 Apr 2013 02:26:11 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id o2sm2196418bkv.3.2013.04.03.02.26.09 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Apr 2013 02:26:10 -0700 (PDT) Sender: Alexander Motin Message-ID: <515BF5AE.4050804@FreeBSD.org> Date: Wed, 03 Apr 2013 12:26:06 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130326 Thunderbird/17.0.4 MIME-Version: 1.0 To: Matthias Andree Subject: Re: Any objections/comments on axing out old ATA stack? References: <51536306.5030907@FreeBSD.org> <20130331130409.GO3178@equilibrium.bsdes.net> <515B25D8.7050902@FreeBSD.org> In-Reply-To: <515B25D8.7050902@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Apr 2013 09:26:13 -0000 On 02.04.2013 21:39, Matthias Andree wrote: > Am 31.03.2013 23:02, schrieb Scott Long: > >> So what I hear you and Matthias saying, I believe, is that it should be easier to >> force disks to fall back to non-NCQ mode, and/or have a more responsive >> black-list for problematic controllers. Would this help the situation? It's hard to >> justify holding back overall forward progress because of some bad controllers; >> we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, >> enough to make up a sizable percentage of the internet's traffic, and we see no >> problems. How can we move forward but also take care of you guys with >> problematic hardware? > > Well, I am running the driver fine off of my WD Caviar RE3 disk, and the > problematic drive also works just fine with Windows and Linux, so it > must be something between the problematic drive and the FreeBSD driver. > > I would like to see any of this, in decreasing order of precedence: > > - debugged driver > > - assistance/instructions on helping how to debug the driver/trace NCQ > stuff/... (as in Jeremy Chadwick's followup in this same thread - this > helps, I will attempt to procure the required information; "back then", > reducing the number of tags to 31 was ineffective, including an error > message and getting a value of 32 when reading the setting back) Unfortunately, I don't know how to debug that. Command timeouts reported on the lists before are the kind of errors that are most difficult to diagnose since the controller gives no information to do that. We just see that sent commands are no longer completing. May be it is some incompatibility of specific drive and HBA firmwares, triggered by some innocent specifics of our ATA stack, GEOM or filesystems implementation. All I can propose is to try to identify such cases and add some quirks to workaround it, like disabling NCQ or limiting number of tags. I am not sure what else can we do about it without some controlled lab environment with affected hardware and SATA analyzer. > - "user-space" contingency features, such as letting camcontrol limit > the number of open NCQ tags, or disable NCQ, either on a per-drive basis I've merged support for that to 8/9-STABLE about 9 months ago: `camcontrol tags ada0 -v -N X` should change number of simultaneously used tags, `camcontrol negotiate ada0 -T (en|dis)able` should enable/disable use of NCQ. I just did some tests on HEAD and these commands seems like working. If you can reproduce the problem, it would be nice to collect information how these changes affect it. > I am capable of debugging C - mostly with gdb command-line, and > graphical Windows IDEs - but am unfamiliar with FreeBSD kernel > debugging. If necessary, I can pull up a second console, but the PC that > is affected is legacy-free, so serial port only works through a > serial/USB converter. -- Alexander Motin