Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Mar 2010 22:27:39 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        Alexander Motin <mav@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: ahcich timeouts, only with ahci, not with ataahci
Message-ID:  <4B9C034B.90900@omnilan.de>
In-Reply-To: <20100303110647.GA51588@icarus.home.lan>
References:  <1266934981.00222684.1266922202@10.7.7.3>	<4B83EFD4.8050403@FreeBSD.org> <4B8E1489.2070306@omnilan.de>	<4B8E1B3D.306@FreeBSD.org> <4B8E1DA9.2090406@omnilan.de> <20100303110647.GA51588@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig99F70BB8ABD28A47139A7119
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Am 03.03.2010 12:06, schrieb Jeremy Chadwick:
> On Wed, Mar 03, 2010 at 09:28:25AM +0100, Harald Schmalzbauer wrote:
>> Alexander Motin schrieb am 03.03.2010 09:18 (localtime):
>>> Harald Schmalzbauer wrote:
>>>> Alexander Motin schrieb am 23.02.2010 16:10 (localtime):
>>>>> Harald Schmalzbauer wrote:
>>>>>> I'm frequently getting my machine locked with ahcichX timeouts:
>>>>>> ahcich2: Timeout on slot 0
>>>>>> ahcich2: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd c0 se=
rr
>>>>>> 00000000
>>>>>> ahcich2: Timeout on slot 8
>>>>>> ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd c0 se=
rr
>>>>>> 00000000
>>>>>> ahcich2: Timeout on slot 8
>>>>>> ahcich2: is 00000000 cs fffff07f ss ffffff7f rs ffffff7f tfd c0 se=
rr
>>>>>> 00000000
>>>>>> ...
>>>>> Looking that is (Interrupt status) is zero and `rs =3D=3D cs | ss` =
(running
>>>>> command bitmasks in driver and hardware), controller doesn't report=

>>>>> command completion. Looking on TFD status 0xc0 with BUSY bit set, I=

>>>>> would suppose that either disk stuck in command processing for some=

>>>>> reason, or controller missed command completion status.
>>>>>
>>>>> Have you noticed 30 second (default ATA timeout) pause before timeo=
ut
>>>>> message printed? Just want to be sure that driver waited enough bef=
ore
>>>>> give up.
>>>>>
>>>>>> This happens when backup over GbE overloads ZFS/HDD capabilities.
>>>>>> I reduced vfs.zfs.txg.timeout to 1 to prevent the machine from loc=
king
>>>>>> up almost immediately, but from it still happens.
>>>>>> When I don't use ahci but ataahci (the old driver if I understand =
things
>>>>>> correct) I also see the ZFS burst write congestion, but this doesn=
't
>>>>>> lead to controller timeouts, thus blocking the machine.
>>>>>>
>>>>>> Sometimes the machine recovers from the disk lock, but most often =
I have
>>>>>> to reboot.
>>>>> How it looks when it doesn't? Can you send me full log messages?
>>>> Hello, this morning I had a stall, but the machine recovered after a=
bout
>>>> one Minute. Here's what I got from the kernel:
>>>> ahcich2: Timeout on slot 29
>>>> ahcich2: is 00000000 cs 00000003 ss e0000003 rs e0000003 tfd c0 serr=

>>>> 00000000
>>>> em1: watchdog timeout -- resetting
>>>> em1: watchdog timeout -- resetting
>>>> ahcich2: Timeout on slot 10
>>>> ahcich2: is 00000000 cs 00006000 ss 00007c00 rs 00007c00 tfd c0 serr=

>>>> 00000000
>>>> ahcich2: Timeout on slot 18
>>>> ahcich2: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd c0 serr=

>>>> 00000000
>>>> ahcich2: Timeout on slot 2
>>>> ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd c0 serr=

>>>> 00000000
>>>> ahcich2: Timeout on slot 2
>>>> ahcich2: is 00000000 cs 00000000 ss 0000000c rs 0000000c tfd 40 serr=

>>>> 00000000
>>>>
>>>> Does this tell you something useful?
>>>
>>> It doesn't. Looking on logged register content - commands are indeed
>>> still running and no interrupts requested. Interesting to see em1
>>> watchdog timeout there. Aren't they related somehow?
>>
>> 	dmesg | grep "irq 18":
>> uhci0: <Intel 82801I (ICH9) USB controller> port 0x20c0-0x20df irq
>> 18 at device 26.0 on pci0
>> uhci4: <Intel 82801I (ICH9) USB controller> port 0x2040-0x205f irq
>> 18 at device 29.2 on pci0
>> em1: <Intel(R) PRO/1000 Network Connection 6.9.14> port
>> 0x1000-0x103f mem 0xe1920000-0xe193ffff,0xe1900000-0xe191ffff irq 18
>> at device 2.0 on pci3
>> ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x2000-0x201f
>> mem 0xe1a22000-0xe1a220ff irq 18 at device 31.3 on pci0
>>
>> The don't share the same IRQ at least.
>> dmesg | grep "irq 21"
>> uhci1: <Intel 82801I (ICH9) USB controller> port 0x20a0-0x20bf irq
>> 21 at device 26.1 on pci0
>> ahci0: <Intel ICH9 AHCI SATA controller> port
>> 0x2408-0x240f,0x2414-0x2417,0x2400-0x2407,0x2410-0x2413,0x2020-0x203f
>> mem 0xe1a21000-0xe1a217ff irq 21 at device 31.2 on pci0
>>
>> The em1 has no cable attached. I get many of these em watchdog
>> timeouts. Never thought they could be related to ahci. I'll see if
>> the em watchdog timeouts happens in any relation to disk usage.
>=20
> Please provide output from the commands I provided.  dmesg|grep is not
> sufficient for helping track this down, specifically with regards to th=
e
> em1 watchdog timeouts.

Sorry for the delay, here's the details:
hostb0@pci0:0:0:0:      class=3D0x060000 card=3D0x34d08086 chip=3D0x29f08=
086
rev=3D0x00 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '3200 Chipset (Bearlake) Processor to I/O Controller'
    class      =3D bridge
    subclass   =3D HOST-PCI
em0@pci0:0:25:0:        class=3D0x020000 card=3D0x34d08086 chip=3D0x10bd8=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
    class      =3D network
    subclass   =3D ethernet
uhci0@pci0:0:26:0:      class=3D0x0c0300 card=3D0x34d08086 chip=3D0x29378=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB Universal Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
uhci1@pci0:0:26:1:      class=3D0x0c0300 card=3D0x34d08086 chip=3D0x29388=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB Universal Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
ehci0@pci0:0:26:7:      class=3D0x0c0320 card=3D0x34d08086 chip=3D0x293c8=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
pcib1@pci0:0:28:0:      class=3D0x060400 card=3D0x29408086 chip=3D0x29408=
086
rev=3D0x02 hdr=3D0x01
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) PCIe Root Port 1'
    class      =3D bridge
    subclass   =3D PCI-PCI
pcib2@pci0:0:28:4:      class=3D0x060400 card=3D0x29488086 chip=3D0x29488=
086
rev=3D0x02 hdr=3D0x01
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) PCIe Root Port 5'
    class      =3D bridge
    subclass   =3D PCI-PCI
uhci2@pci0:0:29:0:      class=3D0x0c0300 card=3D0x34d08086 chip=3D0x29348=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB Universal Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
uhci3@pci0:0:29:1:      class=3D0x0c0300 card=3D0x34d08086 chip=3D0x29358=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB Universal Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
uhci4@pci0:0:29:2:      class=3D0x0c0300 card=3D0x34d08086 chip=3D0x29368=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB Universal Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
ehci1@pci0:0:29:7:      class=3D0x0c0320 card=3D0x34d08086 chip=3D0x293a8=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) USB2 Enhanced Host Contro=
ller'
    class      =3D serial bus
    subclass   =3D USB
pcib3@pci0:0:30:0:      class=3D0x060401 card=3D0x34d08086 chip=3D0x244e8=
086
rev=3D0x92 hdr=3D0x01
    vendor     =3D 'Intel Corporation'
    device     =3D '82801 Family (ICH2/3/4/5/6/7/8/9,63xxESB) Hub
Interface to PCI Bridge'
    class      =3D bridge
    subclass   =3D PCI-PCI
isab0@pci0:0:31:0:      class=3D0x060100 card=3D0x34d08086 chip=3D0x29168=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IR (ICH9R) LPC Interface Controller'
    class      =3D bridge
    subclass   =3D PCI-ISA
ahci0@pci0:0:31:2:      class=3D0x010601 card=3D0x34d08086 chip=3D0x29228=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) 6 port SATA AHCI Controll=
er'
    class      =3D mass storage
    subclass   =3D SATA
ichsmb0@pci0:0:31:3:    class=3D0x0c0500 card=3D0x34d08086 chip=3D0x29308=
086
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D '82801IB/IR/IH (ICH9 Family) SMBus Controller'
    class      =3D serial bus
    subclass   =3D SMBus
vgapci0@pci0:2:0:0:     class=3D0x030000 card=3D0x01018086 chip=3D0x05221=
02b
rev=3D0x02 hdr=3D0x00
    vendor     =3D 'Matrox Electronic Systems Ltd.'
    device     =3D 'Matrox G200e (ServerEngines) - English (G200e)'
    class      =3D display
    subclass   =3D VGA
em1@pci0:3:2:0: class=3D0x020000 card=3D0x34d08086 chip=3D0x10768086 rev=3D=
0x05
hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Gigabit Ethernet Controller (82541EI)'
    class      =3D network
    subclass   =3D ethernet
banana:~>20: vmstat -i
interrupt                          total       rate
irq4: uart0                        27392          0
irq18: em1 uhci0++                   178          0
irq19: uhci3                          22          0
irq23: uhci2 ehci1                     2          0
cpu0: timer                    231629257       1917
irq256: em0                      2559286         21
irq257: ahci0                    1602196         13
cpu1: timer                    231628629       1917
Total                          467446962       3869

Looks very similar to the problem here:
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-03/msg00143.h=
tml

Thanks,

-Harry


--------------enig99F70BB8ABD28A47139A7119
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkucA2wACgkQLDqVQ9VXb8gK/gCgor2kS8lJcVdSi2zqI+aMA68S
DisAn3To1/x7eJpkU20Dxqp7wCQjfYA/
=W7lO
-----END PGP SIGNATURE-----

--------------enig99F70BB8ABD28A47139A7119--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B9C034B.90900>