Date: Wed, 03 Mar 2010 08:49:29 +0100 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-stable@FreeBSD.org Subject: Re: ahcich timeouts, only with ahci, not with ataahci Message-ID: <4B8E1489.2070306@omnilan.de> In-Reply-To: <4B83EFD4.8050403@FreeBSD.org> References: <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigEACFEEAC1A4E4EF02E403F20 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Alexander Motin schrieb am 23.02.2010 16:10 (localtime): > Harald Schmalzbauer wrote: >> I'm frequently getting my machine locked with ahcichX timeouts: >> ahcich2: Timeout on slot 0 >> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 8 >> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr >> 00000000 >> ahcich2: Timeout on slot 8 >> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr >> 00000000 >> ... >=20 > Looking that is (Interrupt status) is zero and `rs =3D=3D cs | ss` (run= ning > command bitmasks in driver and hardware), controller doesn't report > command completion. Looking on TFD status 0xc0 with BUSY bit set, I > would suppose that either disk stuck in command processing for some > reason, or controller missed command completion status. >=20 > Have you noticed 30 second (default ATA timeout) pause before timeout > message printed? Just want to be sure that driver waited enough before > give up. >=20 >> This happens when backup over GbE overloads ZFS/HDD capabilities. >> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking= >> up almost immediately, but from it still happens. >> When I don't use ahci but ataahci (the old driver if I understand thin= gs >> correct) I also see the ZFS burst write congestion, but this doesn't >> lead to controller timeouts, thus blocking the machine. >> >> Sometimes the machine recovers from the disk lock, but most often I ha= ve >> to reboot. >=20 > How it looks when it doesn't? Can you send me full log messages? Hello, this morning I had a stall, but the machine recovered after about = one Minute. Here's what I got from the kernel: ahcich2: Timeout on slot 29 ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr=20 00000000 em1: watchdog timeout -- resetting em1: watchdog timeout -- resetting ahcich2: Timeout on slot 10 ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr=20 00000000 ahcich2: Timeout on slot 18 ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr=20 00000000 ahcich2: Timeout on slot 2 ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr=20 00000000 ahcich2: Timeout on slot 2 ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr=20 00000000 Does this tell you something useful? Thanks, -Harry --------------enigEACFEEAC1A4E4EF02E403F20 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkuOFIoACgkQLDqVQ9VXb8jMbACgtnE+sdmRx6xS+N9icGJJLtZy 7tEAni7izTeffqaAv24SRiBj7pdokpNA =JAju -----END PGP SIGNATURE----- --------------enigEACFEEAC1A4E4EF02E403F20--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B8E1489.2070306>