FreeBSD Mail Archives

Date:      Wed, 03 Mar 2010 09:28:25 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: ahcich timeouts, only with ahci, not with ataahci
Message-ID:  <4B8E1DA9.2090406@omnilan.de>
In-Reply-To: <4B8E1B3D.306@FreeBSD.org>
References:  <1266934981.00222684.1266922202@10.7.7.3> <4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de> <4B8E1B3D.306@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
Alexander Motin schrieb am 03.03.2010 09:18 (localtime):
> Harald Schmalzbauer wrote:
>> Alexander Motin schrieb am 23.02.2010 16:10 (localtime):
>>> Harald Schmalzbauer wrote:
>>>> I'm frequently getting my machine locked with ahcichX timeouts:
>>>> ahcich2: Timeout on slot 0
>>>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 serr
>>>> 00000000
>>>> ahcich2: Timeout on slot 8
>>>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 serr
>>>> 00000000
>>>> ahcich2: Timeout on slot 8
>>>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 serr
>>>> 00000000
>>>> ...
>>> Looking that is (Interrupt status) is zero and `rs == cs | ss` (running
>>> command bitmasks in driver and hardware), controller doesn't report
>>> command completion. Looking on TFD status 0xc0 with BUSY bit set, I
>>> would suppose that either disk stuck in command processing for some
>>> reason, or controller missed command completion status.
>>>
>>> Have you noticed 30 second (default ATA timeout) pause before timeout
>>> message printed? Just want to be sure that driver waited enough before
>>> give up.
>>>
>>>> This happens when backup over GbE overloads ZFS/HDD capabilities.
>>>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from locking
>>>> up almost immediately, but from it still happens.
>>>> When I don't use ahci but ataahci (the old driver if I understand things
>>>> correct) I also see the ZFS burst write congestion, but this doesn't
>>>> lead to controller timeouts, thus blocking the machine.
>>>>
>>>> Sometimes the machine recovers from the disk lock, but most often I have
>>>> to reboot.
>>> How it looks when it doesn't? Can you send me full log messages?
>> Hello, this morning I had a stall, but the machine recovered after about
>>  one Minute. Here's what I got from the kernel:
>> ahcich2: Timeout on slot 29
>> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr
>> 00000000
>> em1: watchdog timeout -- resetting
>> em1: watchdog timeout -- resetting
>> ahcich2: Timeout on slot 10
>> ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 18
>> ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 2
>> ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr
>> 00000000
>> ahcich2: Timeout on slot 2
>> ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr
>> 00000000
>>
>> Does this tell you something useful?
> 
> It doesn't. Looking on logged register content - commands are indeed
> still running and no interrupts requested. Interesting to see em1
> watchdog timeout there. Aren't they related somehow?

	dmesg | grep "irq 18":
uhci0: <Intel 82801I (ICH9) USB controller> port 0x20c0-0x20df irq 18 at 
device 26.0 on pci0
uhci4: <Intel 82801I (ICH9) USB controller> port 0x2040-0x205f irq 18 at 
device 29.2 on pci0
em1: <Intel(R) PRO/1000 Network Connection 6.9.14> port 0x1000-0x103f 
mem 0xe1920000-0xe193ffff,0xe1900000-0xe191ffff irq 18 at device 2.0 on pci3
ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x2000-0x201f mem 
0xe1a22000-0xe1a220ff irq 18 at device 31.3 on pci0

The don't share the same IRQ at least.
dmesg | grep "irq 21"
uhci1: <Intel 82801I (ICH9) USB controller> port 0x20a0-0x20bf irq 21 at 
device 26.1 on pci0
ahci0: <Intel ICH9 AHCI SATA controller> port 
0x2408-0x240f,0x2414-0x2417,0x2400-0x2407,0x2410-0x2413,0x2020-0x203f 
mem 0xe1a21000-0xe1a217ff irq 21 at device 31.2 on pci0

The em1 has no cable attached. I get many of these em watchdog timeouts. 
Never thought they could be related to ahci. I'll see if the em watchdog 
timeouts happens in any relation to disk usage.

Thank you!

-Harry


[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.13 (FreeBSD)

iEYEARECAAYFAkuOHaoACgkQLDqVQ9VXb8i/6wCfabT3X1Hdp6g9QxEMlf772Rmc
7xgAnRs689Gg+JomGAR9niPw4D2In413
=scJE
-----END PGP SIGNATURE-----

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B8E1DA9.2090406>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation