From owner-freebsd-sparc64@FreeBSD.ORG Tue Apr 3 15:10:04 2012 Return-Path: Delivered-To: freebsd-sparc64@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67384106564A for ; Tue, 3 Apr 2012 15:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 420068FC0C for ; Tue, 3 Apr 2012 15:10:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q33FA45c040960 for ; Tue, 3 Apr 2012 15:10:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q33FA4ro040959; Tue, 3 Apr 2012 15:10:04 GMT (envelope-from gnats) Date: Tue, 3 Apr 2012 15:10:04 GMT Message-Id: <201204031510.q33FA4ro040959@freefall.freebsd.org> To: freebsd-sparc64@FreeBSD.org From: Marius Strobl Cc: Subject: Re: sparc64/141918: [ehci] ehci_interrupt: unrecoverable error, controller halted (sparc64) X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Marius Strobl List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Apr 2012 15:10:04 -0000 The following reply was made to PR sparc64/141918; it has been noted by GNATS. From: Marius Strobl To: Manuel Tobias Schiller Cc: bug-followup@FreeBSD.org Subject: Re: sparc64/141918: [ehci] ehci_interrupt: unrecoverable error, controller halted (sparc64) Date: Tue, 3 Apr 2012 17:00:43 +0200 On Tue, Apr 03, 2012 at 10:37:14AM +0200, Manuel Tobias Schiller wrote: > On Mon, 2 Apr 2012 10:43:14 +0200 > Manuel Tobias Schiller wrote: > > > On Mon, 2 Apr 2012 01:00:56 +0200 > > Manuel Tobias Schiller wrote: > > > > > On Sun, 1 Apr 2012 12:41:24 +0200 > > > Marius Strobl wrote: > > > > > > > Well, the individual patches shouldn't make things worse except for > > > > the second one causing more memory to be used so I'd suggest to > > > > combine them. If in the end things actually work we still can check > > > > what changes are needed for that. > > > > Looking at the Linux USB code, the FreeBSD one doesn't some to honor > > > > some DMA constraints and at least for the alignment it's actually > > > > hard to follow what value eventually is used. One thing that stands > > > > out is that for EHCI, the boundary is 4096. This is most easily > > > > fixed by defining USB_PAGE_SIZE to 4096 in sys/dev/usb/usb_busdma.h. > > > > > > > > Marius > > > > > > Ok, the second patch on its own doesn't appear to work either, so I'm > > > trying the combination of patches now. By the way: defining > > > USB_PAGE_SIZE to 4096 in sys/dev/usb/usb_busdma.h is a bad idea - the > > > kernel panics with a backtrace pointing into the mmu-related code. > > > Probably has to do with sparc64 mmu only supporting 8k pages, so I'm > > > not terribly surprised... Ok, I'm waiting for the next make > > > buildkernel to finish, and I'll let you know what comes out. > > > > > > Manuel > > > > Ok, I also tested a kernel with both patches, and the issue persists. Do > > you have something else to try? > > > > Manuel > > > > Hi Marius, > > I did a bit of code reading (/usr/src/sys/dev/usb/controller/ehci.c near > line 1494), and I realised that the "unrecoverable error" message should > only be triggered if the EHCI status register has the EHCI_STS_HCH bit > set - according to the status word dump in my log, it is not set (just > after the "unrecoverable error" message). The register dump re-reads the > status register from the hardware. Could it be that some controllers have > a glitch or something on that particular bit, and we better re-read the > status register before we conclude that the controller "really wanted to > set that bit"? You mean EHCI_STS_HSE? This is expected, ehci_interrupt() clears the pending interrupt status bits before dumping the register content: EOWRITE4(sc, EHCI_USBSTS, status); /* acknowledge */ > I can also see that the bit is set in the original bug report. I don't > know if that machine is just faster (and the bit has not had the time to > clear yet), or if we're talking about two different problems here... Probably, the other controller just sets it again after the bit is cleared. Marius