From owner-freebsd-stable@FreeBSD.ORG Tue Oct 19 13:16:36 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0C934106592C; Tue, 19 Oct 2010 13:16:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id BC0FC8FC14; Tue, 19 Oct 2010 13:16:35 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 509F146B7F; Tue, 19 Oct 2010 09:16:35 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1D5D38A01D; Tue, 19 Oct 2010 09:16:29 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 19 Oct 2010 09:11:06 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201010190911.06961.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 19 Oct 2010 09:16:29 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: mdf@freebsd.org Subject: Re: kldunload usb panic X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Oct 2010 13:16:36 -0000 On Monday, October 18, 2010 4:42:05 pm mdf@freebsd.org wrote: > When we moved to FreeBSD 7 from 6, issuing a kldunload for usb devices > started causing a kernel panic is a USB device was still plugged in > (like a keyboard). The kldunload is done as part of an rc.d script > that unloads usb since it's not generally needed by our product unless > we mounted the root volume from a USB stick. > > The order doesn't matter much, but doing: > > kldunload ucom > kldunload umass > kldunload usb > > panics with this stack: > > panic @ time 1287356740.252, thread 0xffffff0016bd64a0: Fatal trap 12: > page fault while in kernel mode > > cpuid = 2 > > Stack: -------------------------------------------------- > kernel:trap_fatal+0xac > kernel:trap_pfault+0x24c > kernel:trap+0x3d9 > kernel:pmap_kextract+0x70 > kernel:free+0xcd > usb.ko:usb_disconnect_port+0xbd > usb.ko:uhub_detach+0xd2 > kernel:device_detach+0xb3 > kernel:device_delete_child+0x98 > kernel:device_delete_child+0x66 > usb.ko:uhci_pci_detach+0xac > kernel:device_detach+0xb3 > kernel:devclass_delete_driver+0xde > kernel:driver_module_handler+0x11c > kernel:module_unload+0x41 > kernel:linker_file_unload+0x19a > kernel:kern_kldunload+0x10a > kernel:isi_syscall+0x98 > kernel:ia32_syscall+0x1cd > -------------------------------------------------- > > cpuid = 2; apic id = 12 > fault virtual address = 0xffff80403037b7a8 > fault code = supervisor read data, page not present > stack pointer = 0x10:0xffffff8bfe2d0450 > frame pointer = 0x10:0xffffff8bfe2d0470 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > > > The problem is that device_delete_child will recursively call > device_delete_child before calling device_detach(). When > device_detach resolves to uhub_detach, it attempts to call > usb_disconnect_port() which will iterate over the subdevs array. Each > of the pointers of the subdevs array is pointing into already-free'd > storage; the free(9) came from the recursive call to > device_delete_child(). In this case the code is trying to dereference > 0xdeadc0dedeadc0de since this is INVARIANTS with the malloc poisoning > on free(9). > > So questions: > > (1) is there a simple fix, like defining a devclass_t for the port > device, and having it do a detach method cleanup instead of > uhub_detach()? I wasn't sure what to put in for the match method, > though. I think uhub_detach() should use device_get_children() instead of trying to maintain its own list of child devices. If uhub really does need to maintain its own list of children, then a better fix would be to add a 'bus_device_deleted()' callback from device_delete_child() to the parent bus device to let it know a device had been removed, but that would be tricky to use since in many cases a bus device is what invokes device_delete_child() in the first place. -- John Baldwin