From owner-freebsd-current@FreeBSD.ORG  Tue Jun  8 02:42:38 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B569C16A4CE
	for <freebsd-current@freebsd.org>;
	Tue,  8 Jun 2004 02:42:38 +0000 (GMT)
Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw
	[140.138.145.166])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 800FA43D1D
	for <freebsd-current@freebsd.org>;
	Tue,  8 Jun 2004 02:42:38 +0000 (GMT)
	(envelope-from avatar@mmlab.cse.yzu.edu.tw)
Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000)
	id CF0E34EFCD6; Tue,  8 Jun 2004 10:42:37 +0800 (CST)
Received: from localhost (localhost [127.0.0.1])
	by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id C66A14EFCD3;
	Tue,  8 Jun 2004 10:42:37 +0800 (CST)
Date: Tue, 8 Jun 2004 10:42:37 +0800 (CST)
From: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
To: Brian Buchanan <bwb@holo.org>
In-Reply-To: <20040607082413.U93758-100000@thought.holo.org>
Message-ID: <04060810273710.99104@www.mmlab.cse.yzu.edu.tw>
References: <20040607082413.U93758-100000@thought.holo.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@freebsd.org
Subject: Re: T40 panics at usb_get_next_event() when ACPI is disabled
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jun 2004 02:42:38 -0000

On Mon, 7 Jun 2004, Brian Buchanan wrote:
> Yes, I see this too on my T40p, but only when booting with the mouse
> plugged into the laptop through a USB hub connected to the docking
> station.  If the mouse is plugged in directly to the laptop (I haven't
> tried plugging the USB hub directly into the laptop) or not plugged in,

The problem always occurs on my T40 when the USB mouse is directly plugged
into the laptop.

> the problem does not occur.  My hypothesis is that because a certain
> event list entry is being overwritten, the USB event list only grows long
> enough to use this area of memory in this configuration.

Interesting hypothesis. What really bothers me is that the extra
"if (ueq != NULL)" checks didn't catch the NULL ueq case. According to the
backtrace, it crashed at "*ue = ueq->ue," where ueq is NULL at that moment.

> I wrote a function to perform a sanity check on the event list and
> determined that the list is not corrupt after all the USB boot-time events
> have been queued.  The list becomes corrupted some time between then and

I'm curious about the sanity check function you've written. Would you mind
to post it?

> when usbd attempts to read the event queue.  One of the events, the same
> one every time, is overwritten with something like 0x01000010 (I don't
> have a log of the actual bit pattern).  Since it's happening to the same
> object every time, the next step would be to set a watch point in the
> debugger.  I'll probably give this a try once I have a chance to consult
> with someone who knows more about kernel debugging.

Did you try to extract the backtrace from the core file? It helps for
further analysis:

	http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html

Or you'd like to use DDB to do the online kernel debugging:

	http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html

> I did experiment with rolling back some usb commits, but it does not
> appear that a change to the usb subsystem is what caused this breakage.  I
> think something else in the system is misbehaving and overwriting memory.

Perhaps, since the enqueuing/dequeuing of usb_event supposed to be protected
by splusb(), there shouldn't be race here unless there's something wrong in
the interrupt priority(shared with splnet/splnet/splbio?) settings.