Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Mar 2010 09:28:25 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: ahcich timeouts, only with ahci, not with ataahci
Message-ID:  <4B8E1DA9.2090406@omnilan.de>
In-Reply-To: <4B8E1B3D.306@FreeBSD.org>
References:  <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de> <4B8E1B3D.306@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigB773F80F3ABD64AEC1F537B4
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable

Alexander Motin schrieb am 03.03.2010 09:18 (localtime):
> Harald Schmalzbauer wrote:
>> Alexander Motin schrieb am 23.02.2010 16:10 (localtime):
>>> Harald Schmalzbauer wrote:
>>>> I'm frequently getting my machine locked with ahcichX timeouts:
>>>> ahcich2: Timeout on slot 0
>>>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr=

>>>> 00000000
>>>> ahcich2: Timeout on slot 8
>>>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr=

>>>> 00000000
>>>> ahcich2: Timeout on slot 8
>>>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr=

>>>> 00000000
>>>> ...
>>> Looking that is (Interrupt status) is zero and `rs =3D=3D cs | ss` (r=
unning
>>> command bitmasks in driver and hardware), controller doesn't report
>>> command completion. Looking on TFD status 0xc0 with BUSY bit set, I
>>> would suppose that either disk stuck in command processing for some
>>> reason, or controller missed command completion status.
>>>
>>> Have you noticed 30 second (default ATA timeout) pause before timeout=

>>> message printed? Just want to be sure that driver waited enough befor=
e
>>> give up.
>>>
>>>> This happens when backup over GbE overloads ZFS/HDD capabilities.
>>>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locki=
ng
>>>> up almost immediately, but from it still happens.
>>>> When I don't use ahci but ataahci (the old driver if I understand th=
ings
>>>> correct) I also see the ZFS burst write congestion, but this doesn't=

>>>> lead to controller timeouts, thus blocking the machine.
>>>>
>>>> Sometimes the machine recovers from the disk lock, but most often I =
have
>>>> to reboot.
>>> How it looks when it doesn't? Can you send me full log messages?
>> Hello, this morning I had a stall, but the machine recovered after abo=
ut
>>  one Minute. Here's what I got from the kernel:
>> ahcich2: Timeout on slot 29
>> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr
>> 00000000
>> em1: watchdog timeout -- resetting
>> em1: watchdog timeout -- resetting
>> ahcich2: Timeout on slot 10
>> ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 18
>> ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 2
>> ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 2
>> ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr
>> 00000000
>>
>> Does this tell you something useful?
>=20
> It doesn't. Looking on logged register content - commands are indeed
> still running and no interrupts requested. Interesting to see em1
> watchdog timeout there. Aren't they related somehow?

	dmesg | grep "irq 18":
uhci0: <Intel 82801I (ICH9) USB controller> port 0x20c0-0x20df irq 18 at =

device 26.0 on pci0
uhci4: <Intel 82801I (ICH9) USB controller> port 0x2040-0x205f irq 18 at =

device 29.2 on pci0
em1: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0x1000-0x103f=20
mem 0xe1920000-0xe193ffff,0xe1900000-0xe191ffff irq 18 at device 2.0 on p=
ci3
ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x2000-0x201f mem=20
0xe1a22000-0xe1a220ff irq 18 at device 31.3 on pci0

The don't share the same IRQ at least.
dmesg | grep "irq 21"
uhci1: <Intel 82801I (ICH9) USB controller> port 0x20a0-0x20bf irq 21 at =

device 26.1 on pci0
ahci0: <Intel ICH9 AHCI SATA controller> port=20
0x2408-0x240f,0x2414-0x2417,0x2400-0x2407,0x2410-0x2413,0x2020-0x203f=20
mem 0xe1a21000-0xe1a217ff irq 21 at device 31.2 on pci0

The em1 has no cable attached. I get many of these em watchdog timeouts. =

Never thought they could be related to ahci. I'll see if the em watchdog =

timeouts happens in any relation to disk usage.

Thank you!

-Harry


--------------enigB773F80F3ABD64AEC1F537B4
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.13 (FreeBSD)

iEYEARECAAYFAkuOHaoACgkQLDqVQ9VXb8i/6wCfabT3X1Hdp6g9QxEMlf772Rmc
7xgAnRs689Gg+JomGAR9niPw4D2In413
=scJE
-----END PGP SIGNATURE-----

--------------enigB773F80F3ABD64AEC1F537B4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B8E1DA9.2090406>