From owner-freebsd-stable@FreeBSD.ORG Wed Feb 1 19:34:56 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 070E0106566C for ; Wed, 1 Feb 2012 19:34:56 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 87CC88FC08 for ; Wed, 1 Feb 2012 19:34:54 +0000 (UTC) Received: by eekb47 with SMTP id b47so481107eek.13 for ; Wed, 01 Feb 2012 11:34:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=1pGC8slT7ld3wwX23uMV/V7pBhuxSHKaj8yTBmrbOjE=; b=tPv2/Lku7cDcwMe2CSIlUKv6NDKQ+uImPiCdfInhg0B7Q6h2ClY3M+IJWDrhKdH7eJ 8wWDPIw+WSfRYiAzCGw4KgJ/8rry3CCXqOCZF2rCPUddUkY05TTzmhLjgT9WQTy3Noty B5iQNtcDy89YvBNomNHXeLYOtyad+XteVORZM= Received: by 10.14.28.16 with SMTP id f16mr1418090eea.121.1328124893659; Wed, 01 Feb 2012 11:34:53 -0800 (PST) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id b49sm102336036eec.9.2012.02.01.11.34.51 (version=SSLv3 cipher=OTHER); Wed, 01 Feb 2012 11:34:52 -0800 (PST) Sender: Alexander Motin Message-ID: <4F2993DA.2010108@FreeBSD.org> Date: Wed, 01 Feb 2012 21:34:50 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111227 Thunderbird/9.0 MIME-Version: 1.0 To: Andrew Boyer References: <76687387-92D3-4EA5-AD39-3F6820B27DCD@averesystems.com> In-Reply-To: <76687387-92D3-4EA5-AD39-3F6820B27DCD@averesystems.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Stable Mailing List Subject: Re: Kernel panics under 8.2 due to ATA timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2012 19:34:56 -0000 Hi. On 01/30/12 22:46, Andrew Boyer wrote: > I have a system that appears to have a flaky SATA controller (one of the Intel ESB2 variants) and it seems to be exposing a weakness in the ATA driver (not using ATA_CAM). If a command with ATA_R_DIRECT set times out, the channel gets reinitialized, but from the soft interrupt context. It panics when it tries to sleep in ata_queue_request(). > > Timeouts work if ATA_R_DIRECT isn't set because in that case it uses a taskqueue to complete the request. > > Here is the backtrace: >> #0 kdb_enter (why=0xffffffff80962cfa "panic", msg=0xa
) at ../../../kern/subr_kdb.c:349 >> #1 0xffffffff805d6d0b in panic (fmt=Variable "fmt" is not available. >> ) at ../../../kern/kern_shutdown.c:689 >> #2 0xffffffff8061bc53 in sleepq_add (wchan=0xffffff00052c3e58, lock=0xffffff00052c3e38, wmesg=0xffffffff808fa213 "ATA request done", >> flags=1, queue=0) at ../../../kern/subr_sleepqueue.c:320 >> #3 0xffffffff80590c95 in _cv_timedwait (cvp=0xffffff00052c3e58, lock=0xffffff00052c3e38, timo=40000) at ../../../kern/kern_condvar.c:313 >> #4 0xffffffff805d61af in _sema_timedwait (sema=0xffffff00052c3e38, timo=40000, file=0xffffffff808fa1f6 "../../../dev/ata/ata-queue.c", >> line=118) at ../../../kern/kern_sema.c:123 >> #5 0xffffffff8028559f in ata_queue_request (request=0xffffff00052c3dc0) at ../../../dev/ata/ata-queue.c:117 >> #6 0xffffffff80286628 in ata_controlcmd (dev=0xffffff0002e83d00, command=239 '?', feature=Variable "feature" is not available. >> ) at ../../../dev/ata/ata-queue.c:153 >> #7 0xffffffff8027ffd3 in ata_setmode (dev=0xffffff0002e83d00) at ../../../dev/ata/ata-all.c:637 >> #8 0xffffffff802a0af9 in ad_init (dev=0xffffff0002e83d00) at ../../../dev/ata/ata-disk.c:405 >> #9 0xffffffff802a0c29 in ad_reinit (dev=0xffffff0002e83d00) at ../../../dev/ata/ata-disk.c:221 >> #10 0xffffffff80280cad in ata_reinit (dev=0xffffff0002902800) at ata_if.h:79 >> #11 0xffffffff802856c4 in ata_completed (context=Variable "context" is not available. >> ) at ../../../dev/ata/ata-queue.c:313 >> #12 0xffffffff80285ffb in ata_finish (request=0xffffff00054ec8c0) at ../../../dev/ata/ata-queue.c:265 >> #13 0xffffffff805ed419 in softclock (arg=Variable "arg" is not available. >> ) at ../../../kern/kern_timeout.c:430 > > This is very repeatable. I'm not sure what's the best fix - always use a taskqueue on timeouts? Don't reinit if direct commands fail? This is one of the most messy points of the old ata(4). Problem is that reinit implemented to work synchronously. It means that if some command caused timeout and started reinit, that reinit runs from the taskqueue, blocking it. As result, we can't use taskqueue for completion there and can't do reinit on one of reinit commands timeout. That is handled using ATA_STALL_QUEUE flag. I remember I've intentionally blocked new device detection on reinit to avoid problems with taskqueue there. What's about ATA_R_DIRECT, sorry, I don't remember why it is used there or why it is needed at all. It was done before me. The only place where I see it set except ataraid is ata_getparam(), that should be called only on initial bus probe. -- Alexander Motin