From owner-freebsd-stable@FreeBSD.ORG Tue Feb 23 16:08:05 2010 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DEE7106566C; Tue, 23 Feb 2010 16:08:05 +0000 (UTC) (envelope-from h.schmalzbauer@omnilan.de) Received: from host.omnilan.net (host.omnilan.net [62.245.232.135]) by mx1.freebsd.org (Postfix) with ESMTP id B07998FC1B; Tue, 23 Feb 2010 16:08:04 +0000 (UTC) Received: from titan.flintsbach.schmalzbauer.de (titan.flintsbach.schmalzbauer.de [172.21.1.150]) (authenticated bits=0) by host.omnilan.net (8.13.8/8.13.8) with ESMTP id o1NG83Hf048778 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Feb 2010 17:08:03 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Message-ID: <4B83FD62.2020407@omnilan.de> Date: Tue, 23 Feb 2010 17:08:02 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Thunderbird 2.0.0.23 (X11/20090906) MIME-Version: 1.0 To: Alexander Motin References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> In-Reply-To: <4B83EFD4.8050403@FreeBSD.org> X-Enigmail-Version: 0.95.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig4B8E4FCE8276EE64A2F3CC38" Cc: freebsd-stable@FreeBSD.org Subject: Re: ahcich timeouts, only with ahci, not with ataahci X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2010 16:08:05 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig4B8E4FCE8276EE64A2F3CC38 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Alexander Motin schrieb am 23.02.2010 16:10 (localtime): > Harald Schmalzbauer wrote: >> I'm frequently getting my machine locked with ahcichX timeouts: >> ahcich2: Timeout on slot 0 >> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 8 >> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 8 >> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr >> 00000000 >> ... >=20 > Looking that is (Interrupt status) is zero and `rs =3D=3D cs | ss` (run= ning > command bitmasks in driver and hardware), controller doesn't report > command completion. Looking on TFD status 0xc0 with BUSY bit set, I > would suppose that either disk stuck in command processing for some > reason, or controller missed command completion status. >=20 > Have you noticed 30 second (default ATA timeout) pause before timeout > message printed? Just want to be sure that driver waited enough before > give up. Yes, there is some pause between the occurance of the hang and the first = timeout message. But I can't tell you exactly if it's 30 seconds. I=20 guess rather more than 30 sec. >> This happens when backup over GbE overloads ZFS/HDD capabilities. >> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking= >> up almost immediately, but from it still happens. >> When I don't use ahci but ataahci (the old driver if I understand thin= gs >> correct) I also see the ZFS burst write congestion, but this doesn't >> lead to controller timeouts, thus blocking the machine. >> >> Sometimes the machine recovers from the disk lock, but most often I ha= ve >> to reboot. >=20 > How it looks when it doesn't? Can you send me full log messages? Unfortunately not. That happened only once (which I recognized), 3 days=20 ago and messages got turned over 5 times since then... But I have some messages from 02/15, with kernel from january. Usually=20 the messages continue to pop up until I reset the machine. This time=20 there were only the three above, even after waiting half an hour (had to = go on site). The old messages: ahcich2: Timeout on slot 20 ahcich2: is 00000000 cs ff07ffff ss fff7ffff rs fff7ffff tfd c0 serr=20 00000000 ahcich4: Timeout on slot 24 ahcich4: is 00000000 cs f07fffff ss ff7fffff rs ff7fffff tfd c0 serr=20 00000000 ahcich2: Timeout on slot 17 ahcich2: is 00000000 cs fff9ffff ss ffffffff rs ffffffff tfd c0 serr=20 00000000 ahcich4: Timeout on slot 20 ahcich4: is 00000000 cs 00300000 ss 00000000 rs 00300000 tfd c0 serr=20 00000000 ahcich2: Timeout on slot 15 ahcich2: is 00000000 cs fff87fff ss ffffffff rs ffffffff tfd c0 serr=20 00000000 ahcich4: Timeout on slot 22 ahcich4: is 00000000 cs fc0fffff ss ffcfffff rs ffcfffff tfd c0 serr=20 00000000 ahcich2: Timeout on slot 13 ahcich2: is 00000000 cs ffff1fff ss ffffffff rs ffffffff tfd c0 serr=20 00000000 ahcich4: Timeout on slot 16 ahcich4: is 00000000 cs 00010000 ss 00000000 rs 00010000 tfd c0 serr=20 00000000 ahcich2: Timeout on slot 11 ahcich2: is 00000000 cs ffffc7ff ss ffffffff rs ffffffff tfd c0 serr=20 00000000 ahcich4: Timeout on slot 16 ahcich4: is 00000000 cs 00000000 ss 00010000 rs 00010000 tfd 40 serr=20 00000000 Maybe it's helpful to you. Since I haven't seen the hang after=20 upgrading, although doing extensive network transfer tests, I thought it = vanished and haven't kept logs safe... >> Kernel is from Feb. 19, so recent ahci improovements are active. >> Controller is ICH9R with 3 Samsung F3 SpinPoints. >> >> Any ideas how to work arround the hangs other than using the old ahci >> driver? >=20 > Old ataahci driver wasn't using NCQ. NCQ may trigger some bugs in drive= > firmware or expose some protocol inconsistencies. I would recommend you= > to search for some errata for your drive and possibly firmware update. Sounds reasonable. How can I disable NCQ with new ahci? I guess if it's a HDD firmware issue with NCQ the hang shouldn't happen=20 when NCQ is disabled. Btw, I found camcontrol cmd ada0 -a "EF 85 00 00 00 00 00 00 00 00 00=20 00" for disabling APM and another one for disabling AAM. I did that for=20 my drives. Is there a wiki where we can place such valuable commands? Thanks, -Harry --------------enig4B8E4FCE8276EE64A2F3CC38 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkuD/WMACgkQLDqVQ9VXb8hgRgCeJo/dUvVw3mzgwXf/JPjh245g 230An31KgZM6DP+Jy95EgfvnkhXOAm0F =b+YU -----END PGP SIGNATURE----- --------------enig4B8E4FCE8276EE64A2F3CC38--