From owner-freebsd-current@FreeBSD.ORG Tue Jun 8 02:42:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B569C16A4CE for ; Tue, 8 Jun 2004 02:42:38 +0000 (GMT) Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw [140.138.145.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 800FA43D1D for ; Tue, 8 Jun 2004 02:42:38 +0000 (GMT) (envelope-from avatar@mmlab.cse.yzu.edu.tw) Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000) id CF0E34EFCD6; Tue, 8 Jun 2004 10:42:37 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id C66A14EFCD3; Tue, 8 Jun 2004 10:42:37 +0800 (CST) Date: Tue, 8 Jun 2004 10:42:37 +0800 (CST) From: Tai-hwa Liang To: Brian Buchanan In-Reply-To: <20040607082413.U93758-100000@thought.holo.org> Message-ID: <04060810273710.99104@www.mmlab.cse.yzu.edu.tw> References: <20040607082413.U93758-100000@thought.holo.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: T40 panics at usb_get_next_event() when ACPI is disabled X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2004 02:42:38 -0000 On Mon, 7 Jun 2004, Brian Buchanan wrote: > Yes, I see this too on my T40p, but only when booting with the mouse > plugged into the laptop through a USB hub connected to the docking > station. If the mouse is plugged in directly to the laptop (I haven't > tried plugging the USB hub directly into the laptop) or not plugged in, The problem always occurs on my T40 when the USB mouse is directly plugged into the laptop. > the problem does not occur. My hypothesis is that because a certain > event list entry is being overwritten, the USB event list only grows long > enough to use this area of memory in this configuration. Interesting hypothesis. What really bothers me is that the extra "if (ueq != NULL)" checks didn't catch the NULL ueq case. According to the backtrace, it crashed at "*ue = ueq->ue," where ueq is NULL at that moment. > I wrote a function to perform a sanity check on the event list and > determined that the list is not corrupt after all the USB boot-time events > have been queued. The list becomes corrupted some time between then and I'm curious about the sanity check function you've written. Would you mind to post it? > when usbd attempts to read the event queue. One of the events, the same > one every time, is overwritten with something like 0x01000010 (I don't > have a log of the actual bit pattern). Since it's happening to the same > object every time, the next step would be to set a watch point in the > debugger. I'll probably give this a try once I have a chance to consult > with someone who knows more about kernel debugging. Did you try to extract the backtrace from the core file? It helps for further analysis: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html Or you'd like to use DDB to do the online kernel debugging: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html > I did experiment with rolling back some usb commits, but it does not > appear that a change to the usb subsystem is what caused this breakage. I > think something else in the system is misbehaving and overwriting memory. Perhaps, since the enqueuing/dequeuing of usb_event supposed to be protected by splusb(), there shouldn't be race here unless there's something wrong in the interrupt priority(shared with splnet/splnet/splbio?) settings.