From owner-freebsd-stable@FreeBSD.ORG Wed Mar 3 07:52:57 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23A54106566C for ; Wed, 3 Mar 2010 07:52:57 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id C72F38FC08 for ; Wed, 3 Mar 2010 07:52:56 +0000 (UTC) Received: from omta24.westchester.pa.mail.comcast.net ([76.96.62.76]) by qmta08.westchester.pa.mail.comcast.net with comcast id oXrJ1d0021ei1Bg58Xswp1; Wed, 03 Mar 2010 07:52:56 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta24.westchester.pa.mail.comcast.net with comcast id oXvE1d00C3S48mS3kXvFTi; Wed, 03 Mar 2010 07:55:15 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AF4921E3035; Tue, 2 Mar 2010 23:52:54 -0800 (PST) Date: Tue, 2 Mar 2010 23:52:54 -0800 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20100303075254.GA47119@icarus.home.lan> References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4B8E1489.2070306@omnilan.de> User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: ahcich timeouts, only with ahci, not with ataahci X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Mar 2010 07:52:57 -0000 On Wed, Mar 03, 2010 at 08:49:29AM +0100, Harald Schmalzbauer wrote: > Alexander Motin schrieb am 23.02.2010 16:10 (localtime): > >Harald Schmalzbauer wrote: > >>I'm frequently getting my machine locked with ahcichX timeouts: > >>ahcich2: Timeout on slot 0 > >>ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr > >>00000000 > >>ahcich2: Timeout on slot 8 > >>ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr > >>00000000 > >>ahcich2: Timeout on slot 8 > >>ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr > >>00000000 > >>... > > > >Looking that is (Interrupt status) is zero and `rs == cs | ss` (running > >command bitmasks in driver and hardware), controller doesn't report > >command completion. Looking on TFD status 0xc0 with BUSY bit set, I > >would suppose that either disk stuck in command processing for some > >reason, or controller missed command completion status. > > > >Have you noticed 30 second (default ATA timeout) pause before timeout > >message printed? Just want to be sure that driver waited enough before > >give up. > > > >>This happens when backup over GbE overloads ZFS/HDD capabilities. > >>I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking > >>up almost immediately, but from it still happens. > >>When I don't use ahci but ataahci (the old driver if I understand things > >>correct) I also see the ZFS burst write congestion, but this doesn't > >>lead to controller timeouts, thus blocking the machine. > >> > >>Sometimes the machine recovers from the disk lock, but most often I have > >>to reboot. > > > >How it looks when it doesn't? Can you send me full log messages? > > Hello, this morning I had a stall, but the machine recovered after > about one Minute. Here's what I got from the kernel: > ahcich2: Timeout on slot 29 > ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr > 00000000 > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting Please provide the following output: pciconf -lv vmstat -i -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |