From owner-freebsd-stable@FreeBSD.ORG Wed Mar 3 08:28:28 2010 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 636881065677; Wed, 3 Mar 2010 08:28:28 +0000 (UTC) (envelope-from h.schmalzbauer@omnilan.de) Received: from host.omnilan.net (host.omnilan.net [62.245.232.135]) by mx1.freebsd.org (Postfix) with ESMTP id D4FEA8FC08; Wed, 3 Mar 2010 08:28:27 +0000 (UTC) Received: from titan.flintsbach.schmalzbauer.de (titan.flintsbach.schmalzbauer.de [172.21.1.150]) (authenticated bits=0) by host.omnilan.net (8.13.8/8.13.8) with ESMTP id o238SQWP016286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 3 Mar 2010 09:28:26 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Message-ID: <4B8E1DA9.2090406@omnilan.de> Date: Wed, 03 Mar 2010 09:28:25 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Thunderbird 2.0.0.23 (X11/20090906) MIME-Version: 1.0 To: Alexander Motin References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de> <4B8E1B3D.306@FreeBSD.org> In-Reply-To: <4B8E1B3D.306@FreeBSD.org> X-Enigmail-Version: 0.95.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigB773F80F3ABD64AEC1F537B4" Cc: freebsd-stable@FreeBSD.org Subject: Re: ahcich timeouts, only with ahci, not with ataahci X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Mar 2010 08:28:28 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB773F80F3ABD64AEC1F537B4 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Alexander Motin schrieb am 03.03.2010 09:18 (localtime): > Harald Schmalzbauer wrote: >> Alexander Motin schrieb am 23.02.2010 16:10 (localtime): >>> Harald Schmalzbauer wrote: >>>> I'm frequently getting my machine locked with ahcichX timeouts: >>>> ahcich2: Timeout on slot 0 >>>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr= >>>> 00000000 >>>> ahcich2: Timeout on slot 8 >>>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr= >>>> 00000000 >>>> ahcich2: Timeout on slot 8 >>>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr= >>>> 00000000 >>>> ... >>> Looking that is (Interrupt status) is zero and `rs =3D=3D cs | ss` (r= unning >>> command bitmasks in driver and hardware), controller doesn't report >>> command completion. Looking on TFD status 0xc0 with BUSY bit set, I >>> would suppose that either disk stuck in command processing for some >>> reason, or controller missed command completion status. >>> >>> Have you noticed 30 second (default ATA timeout) pause before timeout= >>> message printed? Just want to be sure that driver waited enough befor= e >>> give up. >>> >>>> This happens when backup over GbE overloads ZFS/HDD capabilities. >>>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locki= ng >>>> up almost immediately, but from it still happens. >>>> When I don't use ahci but ataahci (the old driver if I understand th= ings >>>> correct) I also see the ZFS burst write congestion, but this doesn't= >>>> lead to controller timeouts, thus blocking the machine. >>>> >>>> Sometimes the machine recovers from the disk lock, but most often I = have >>>> to reboot. >>> How it looks when it doesn't? Can you send me full log messages? >> Hello, this morning I had a stall, but the machine recovered after abo= ut >> one Minute. Here's what I got from the kernel: >> ahcich2: Timeout on slot 29 >> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr >> 00000000 >> em1: watchdog timeout -- resetting >> em1: watchdog timeout -- resetting >> ahcich2: Timeout on slot 10 >> ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 18 >> ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 2 >> ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 2 >> ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr >> 00000000 >> >> Does this tell you something useful? >=20 > It doesn't. Looking on logged register content - commands are indeed > still running and no interrupts requested. Interesting to see em1 > watchdog timeout there. Aren't they related somehow? dmesg | grep "irq 18": uhci0: port 0x20c0-0x20df irq 18 at = device 26.0 on pci0 uhci4: port 0x2040-0x205f irq 18 at = device 29.2 on pci0 em1: port 0x1000-0x103f=20 mem 0xe1920000-0xe193ffff,0xe1900000-0xe191ffff irq 18 at device 2.0 on p= ci3 ichsmb0: port 0x2000-0x201f mem=20 0xe1a22000-0xe1a220ff irq 18 at device 31.3 on pci0 The don't share the same IRQ at least. dmesg | grep "irq 21" uhci1: port 0x20a0-0x20bf irq 21 at = device 26.1 on pci0 ahci0: port=20 0x2408-0x240f,0x2414-0x2417,0x2400-0x2407,0x2410-0x2413,0x2020-0x203f=20 mem 0xe1a21000-0xe1a217ff irq 21 at device 31.2 on pci0 The em1 has no cable attached. I get many of these em watchdog timeouts. = Never thought they could be related to ahci. I'll see if the em watchdog = timeouts happens in any relation to disk usage. Thank you! -Harry --------------enigB773F80F3ABD64AEC1F537B4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkuOHaoACgkQLDqVQ9VXb8i/6wCfabT3X1Hdp6g9QxEMlf772Rmc 7xgAnRs689Gg+JomGAR9niPw4D2In413 =scJE -----END PGP SIGNATURE----- --------------enigB773F80F3ABD64AEC1F537B4--