From owner-freebsd-current@FreeBSD.ORG Mon Oct 24 18:50:45 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F19C106566C for ; Mon, 24 Oct 2011 18:50:45 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2DF4A8FC12 for ; Mon, 24 Oct 2011 18:50:44 +0000 (UTC) Received: by vcbfo13 with SMTP id fo13so7888432vcb.13 for ; Mon, 24 Oct 2011 11:50:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.90.228 with SMTP id bz4mr11272000vdb.74.1319480869471; Mon, 24 Oct 2011 11:27:49 -0700 (PDT) Received: by 10.220.194.194 with HTTP; Mon, 24 Oct 2011 11:27:49 -0700 (PDT) X-Originating-IP: [93.221.182.160] In-Reply-To: <20111018131353.GA83797@lexx.ifp.tuwien.ac.at> References: <20111008201456.GA3529@lexx.ifp.tuwien.ac.at> <20111017190027.GA9873@lexx.ifp.tuwien.ac.at> <20111018131353.GA83797@lexx.ifp.tuwien.ac.at> Date: Mon, 24 Oct 2011 20:27:49 +0200 Message-ID: From: "C. P. Ghost" To: Alexey Shuvaev Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-current@freebsd.org Subject: Re: Panics after AHCI timeouts X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 18:50:45 -0000 On Tue, Oct 18, 2011 at 3:13 PM, Alexey Shuvaev wrote: > On Tue, Oct 18, 2011 at 06:19:19AM +0800, Adrian Chadd wrote: >> On 18 October 2011 03:00, Alexey Shuvaev >> wrote: >> > On Sat, Oct 08, 2011 at 10:14:56PM +0200, Alexey Shuvaev wrote: >> >> Hello list! >> >> >> > Errr... Replying to myself... Ping? Should I file a PR and put it >> > in the back burner? :) >> >> I think filing a PR is a good move. Then just be proactive and poke >> people about it. It'd be good to get this fixed. :) >> > Done, kern/161768. > > Question to the list: does anybody see successful recovery from AHCI > timeout an a recent CURRENT? Recent means June 2011 or newer, so 9.0 > branch counts also. That is, there are some kernel messages like this: > > ahcich0: Timeout on slot 29 port 0 > ahcich0: is 00000000 cs 00000000 ss ffffffff rs ffffffff tfd 40 serr 00000000 cmd 0000fc17 > > but then AHCI recovers and the system does not panic? I'm seeing these timeouts too on an 8.2-STABLE amd64 r222832 from June 7. The system hangs partially -- or, more precisely, all processes that attempt to access the disk on this channel hang, everything else continues as normal. I suspect a faulty cable, but I don't have physical access to the system to replace parts right now. A panic would be a regression, so I'm holding off updates on that server until AHCI becomes more tolerant and somewhat self-healing. :( > Poking Alexey. -cpghost. -- Cordula's Web. http://www.cordula.ws/