From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 1 13:22:02 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85212106566B; Wed, 1 Dec 2010 13:22:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 41D2E8FC17; Wed, 1 Dec 2010 13:22:02 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C029446B3B; Wed, 1 Dec 2010 08:22:01 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 938A58A009; Wed, 1 Dec 2010 08:22:00 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Wed, 1 Dec 2010 08:17:16 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201012010817.17120.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 01 Dec 2010 08:22:00 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: =?utf-8?q?Ond=C5=99ej_Majerech?= , FreeBSD Mailing List Subject: Re: 8.1-RELEASE hangs on reboot X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2010 13:22:02 -0000 On Tuesday, November 30, 2010 8:23:19 pm Ond=C5=99ej Majerech wrote: > Hello, >=20 > my 8.1-R system has just started hanging on reboot. Specifically after > I svn up'd my source and updated from 8.1-R-p1 to -p2. >=20 > Some kind of hang occurs on every reboot attempt. Usually it hangs at > the "Rebooting..." message, but sometimes the thing just locks up > before it even syncs disks. shutdown -p now seems to shutdown the > system successfully each time. >=20 > So I booted into single-user mode, executed "reboot" and during the > "Syncing disks" I pressed Ctrl-Alt-Escape to break into the debugger. > There I single-stepped with the "s" command until the thing simply > stopped doing anything. (Even if I pressed NumLock, the LED on the > keyboard wouldn't turn off.) >=20 > The screen content at the moment of hang is (dutifully typed over as > the thing is dead and I don't have a serial cable): >=20 > [thread pid 12 tid 100017 ] > Stopped at sckbdevent+0x5f: call _mtx_unlock_flags > db> > [thread pid 12 tid 100017 ] > Stopped at _mtx_unlock_flags: pushq %rbp > db> > [thread pid 12 tid 100017 ] > Stopped at _mtx_unlock_flags+0x1: movq %rsp,%rbp > db> > [thread pid 12 tid 100017 ] > Stopped at _mtx_unloock_flags+0x4: subq $0x20,%rsp > db> > [thread pid 12 tid 100017 ] > Stopped at _mtx_unlock_flags+0x8: movq %rbx,(%rsp) > db> > [thread pid 12 tid 100017 ] > Stopped at _mtx_unlock_flags+0xc: movq %r12,0x8(%rsp) > db> > [thread pid 12 pid 100017 ] > Stopped at _mtx_unlock_flags+0x11: movq %rdi,%rbx > db> > [thread pid 12 pid 100017 ] > Stopped at _mtx_unlock_flags+0x14: movq %r13,0x10(%rsp) > db> > E >=20 > Including that "E" at the end. No good ideas here, though I think we just turned off PSL_T by accident so it ran for a while before hanging after this. 'E' must be the start of a message on the console. > As I said, it's 8.1-RELEASE-p2; it's on AMD64. I'm using custom kernel > which only differs from GENERIC by addition of the debugging options: >=20 > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options DEBUG_LOCKS > options DEBUG_VFS_LOCKS > options DIAGNOSTIC >=20 > I tried rebooting with ACPI disabled, but the thing paniced on boot with >=20 > panic: Duplicate free of item 0xffffff00025e0000 from zone > 0xffffff00bfdcc2a0(1024) >=20 > cpuid =3D 0 > KDB: enter: panic > [thread pid 0 tid 100000 ] > Stopped at kdb_enter+0x3d: movq $0, 0x6b2d20(%rip) > db> bt > Tracing pid 0 tid 100000 td 0xffffffff80c63fc0 > kdb_enter() at kdb_enter+0x3d > panic() at panic+0x17b > uma_dbg_free() at uma_dbg_free+0x171 > uma_zfree_arg() at uma_zfree_arg+0x68 > free() at free+0xcd > device_set_driver() at device_set_driver+0x7c > device_attach() at device_attach+0x19b > bus_generic_attach() at bus_generic_attach+0x1a > pci_attach() at pci_attach+0xf1 The free() should be the free to free the softc but that implies it had a=20 previous driver and softc. Maybe add some debug info to devclass_set_drive= r()=20 to print out the previous driver's name (and maybe the value of the pointer) before free'ing the softc. You could use gdb on the kernel.debug and the=20 pointer value to figure out exactly which driver was the previous one and l= ook=20 to see if it's probe routine does something funky with the softc pointer. =2D-=20 John Baldwin