Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Apr 2012 13:00:24 GMT
From:      Marius Strobl <marius@alchemy.franken.de>
To:        freebsd-sparc64@FreeBSD.org
Subject:   Re: sparc64/141918: [ehci] ehci_interrupt: unrecoverable error, controller halted (sparc64)
Message-ID:  <201204151300.q3FD0ODr041098@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR sparc64/141918; it has been noted by GNATS.

From: Marius Strobl <marius@alchemy.franken.de>
To: Manuel Tobias Schiller <mala@hinterbergen.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: sparc64/141918: [ehci] ehci_interrupt: unrecoverable error, controller halted (sparc64)
Date: Sun, 15 Apr 2012 14:51:05 +0200

 On Wed, Apr 11, 2012 at 12:59:54PM +0200, Manuel Tobias Schiller wrote:
 > On Fri, 6 Apr 2012 20:37:26 +0200
 > Marius Strobl <marius@alchemy.franken.de> wrote:
 > 
 > > On Fri, Apr 06, 2012 at 09:58:42AM +0200, Manuel Tobias Schiller wrote:
 > > > On Thu, 5 Apr 2012 18:21:24 +0200
 > > > Manuel Tobias Schiller <mala@hinterbergen.de> wrote:
 > > > 
 > > > > On Wed, 4 Apr 2012 14:59:46 +0200
 > > > > Marius Strobl <marius@alchemy.franken.de> wrote:
 > > > > 
 > > > > > Hrm, okay, would be interesting to know what the machine actually
 > > > > > does. Looking at the code I found another bug; the VIA-workaround
 > > > > > currently doesn't do anything:
 > > > > > http://people.freebsd.org/~marius/ehci_pci_fix_via_quirk.diff
 > > > > > This might apply for the insane I/O you've reported but I'm unsure
 > > > > > whether it makes a difference for the HSE interrupt.
 > > > > > 
 > > > > > Marius
 > > > > 
 > > > > From the looks of it (with your patch at
 > > > > http://people.freebsd.org/~marius/usb_busdma.diff), the machine
 > > > > starts booting, then tries to mount the filesystems residing on the
 > > > > USB disks, apparently does some I/O (while still processing
 > > > > interrupts), and after less than a minute locks up solid without
 > > > > any indication on the serial console as to what went wrong...
 > > > > 
 > > > > I've started another build with your "VIA quirk fix" but without the
 > > > > patch in the last paragraph (the machine locking up is a lot worse
 > > > > than just USB not working after some heavy I/O, so I left it out
 > > > > for now), but since I started the build without being properly
 > > > > awake this morning, I typed "make buildworld" where I wanted to
 > > > > type "make buildkernel", so it's going to take some time. Also,
 > > > > I'll be leaving CERN over easter, so I won't be running tests on
 > > > > that machine from tomorrow morning until Monday evening (I can
 > > > > compile kernels, though). Anyhow, I'll let you know what comes out.
 > > > > 
 > > > > Cheers, thanks a lot for your effort, and, of course, a Happy
 > > > > Easter!
 > > > > 
 > > > > Manuel
 > > > 
 > > > Hi,
 > > > 
 > > > the "VIA quirk fix" on its own gives the familiar message in dmesg
 > > > (unrecoverable error, controller halted), so I'm compiling a kernel
 > > > which
 > > 
 > > Oof, this likely means there's a more basic problem with this device.
 > > Have you already tried to re-seat the card in case there's an electrical
 > > problem?
 > > Please also provide the output of `pciconf -rb ehci0@pci0:2:5:2 0:255'
 > > from a booting kernel.
 > > FYI, after some digging I've found the following card
 > > ehci0@pci0:2:5:2: class=0x0c0320 card=0x31041106 chip=0x31041106
 > > rev=0x6h0 which is a newer revision of your device and works just fine
 > > in a T1-200 including with the usb(4) fixes. The publicly available
 > > datasheets for the VIA USB controllers are minimal and exclude errata
 > > and Linux also doesn't seem to use any additional work arounds, so I'm
 > > starting to run out of ideas what could be wrong with your revision.
 > > The only remaining thing to give a try I currently can think of is to
 > > test whether it chokes on the generic initialization done by the
 > > sparc64 PCI code using the attached patch.
 > > 
 > > > combines this fix with your latest busdma fix to try them both
 > > > together;
 > > 
 > > This combination is unlikely to make a difference.
 > > 
 > > Marius
 > > 
 > 
 > Hi Marius,
 > 
 > I've tried your new patch, both on its own and in conjunction with the 
 > latest busdma and Via quirk fixes, and I still get the same error
 > message...
 > 
 > Here's the output of pciconf you requested:
 > 
 > mala@router:~> sudo pciconf -rb ehci0@pci0:2:5:2 0:255
 > Password:
 > 06 11 04 31 06 00 10 22  65 20 03 0c 00 16 80 00 
 > 00 a0 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 06 11 04 31 
 > 00 00 00 00 80 00 00 00  00 00 00 00 14 03 00 00 
 > 00 00 0b 00 00 00 00 00  a0 20 00 29 00 00 ff ff 
 
 This is rather confusing; the 0x29 in the above line means that the
 VIA workaround is applied. Didn't you say that with the fix to
 actually apply it, the kernel panics as soon as attaching the
 device?
 Apart from this, the configuration space differs in 3 undocumented
 bytes from mine. I'm not sure whether it's worth trying whether
 these make a difference ...
 
 > 00 5a 04 80 00 00 00 00  04 0b 88 88 33 00 00 00 
 > 20 20 01 00 00 00 00 00  01 00 00 00 00 00 00 c0 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 01 00 0a 7e 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
 > 00 00 00 00 00 00 00 03  00 00 00 00 00 00 00 00
 > 
 > This was taken after the controller stopped, on a kernel with your
 > latest patch, but I'd guess that doesn't matter - the EHCI driver should
 > not be playing with the PCI settings after initialisation...
 > 
 > I've also opened the machine, and the PCI card is seated properly. I even
 > removed it and tried an even older VIA EHCI controller and one of the
 > first USB 2.0 controllers by NEC - no luck, the VIA one had trouble
 > recognizing devices, the NEC one did not recognize a single one I plugged
 > in.
 > 
 
 This also is rather strange. Have you ever used any other type of
 card in the slot, f.e. an NIC, so you can rule out it's broken
 somehow?
 How does using the on-board USB controller work out?
 
 Marius
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204151300.q3FD0ODr041098>