From owner-freebsd-current@FreeBSD.ORG Thu Dec 18 23:53:29 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 52D7E16A4CE; Thu, 18 Dec 2003 23:53:29 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id B399643D31; Thu, 18 Dec 2003 23:53:26 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id hBJ7rGeF076868; Thu, 18 Dec 2003 23:53:20 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200312190753.hBJ7rGeF076868@gw.catspoiler.org> Date: Thu, 18 Dec 2003 23:53:16 -0800 (PST) From: Don Lewis To: mezz7@cox.net, "Justin T. Gibbs" In-Reply-To: <200312190548.hBJ5mSeF076722@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org Subject: Re: The reboot doesn't work anymore in today -CURRENT.. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Dec 2003 07:53:29 -0000 On 18 Dec, Don Lewis wrote: > On 18 Dec, Jeremy Messenger wrote: >> On Fri, 19 Dec 2003 00:20:48 -0500, Joe Marcus Clarke >> wrote: >> >>> On Fri, 2003-12-19 at 00:16, Jeremy Messenger wrote: >>>> Yesterday and today, I CVSup'ed and the reboot doesn't work anymore, >>>> something isn't work correct. The last week -CURRENT works fine, I can >>>> reboot. I tried to use GENERIC kernel and I still get the same result, >>>> so >>>> I am not sure what I should do next. Attaching the dmesg.txt here.. >>>> >>>> shutdown's -r and -p don't work, I had to press the power button to turn >>>> off and on by manual. It does shutdown clean like 'shutdown -h', >>>> thought. > > I just saw the same thing here when I rebooted a two day old kernel. Here's my stack trace and the output of the ddb ps command: db> tr siointr1(c6933800,0,c089c7f4,6a0,e0414c88) at siointr1+0x11c siointr(c6933800,c0809e0a,c29a1cec,e0414c7c,4) at siointr+0x35 intr_execute_handlers(c29a1c8c,e0414ca0,e0414ce8,c07fdcc3,34) at intr_execute_handlers+0xc8 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc080f960, esp = 0xe0414ce4, ebp = 0xe0414ce8 --- mp_grab_cpu_hlt(e0414d10,c063c6bc,c0940940,2,c087fd4e) at mp_grab_cpu_hlt+0x30 cpu_idle(c0940940,2,c087fd4e,53,c063c680) at cpu_idle+0x8 idle_proc(0,e0414d48,c087fbff,311,0) at idle_proc+0x3c fork_exit(c063c680,0,e0414d48) at fork_exit+0xb4 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe0414d7c, ebp = 0 --- db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 59 c693e388 e1ee7000 0 0 0 0000204 [SLP]- 0xc0970f8c] nfsiod 3 58 c693e54c e1ee8000 0 0 0 0000204 [SLP]- 0xc0970f88] nfsiod 2 57 c693e710 e1ee9000 0 0 0 0000204 [SLP]- 0xc0970f84] nfsiod 1 56 c693e8d4 e1eea000 0 0 0 0000204 [SLP]- 0xc0970f80] nfsiod 0 55 c693ea98 e1eeb000 0 0 0 0000204 [SLP]ktsusp 0xc693eb98] vnlru 54 c693ec5c e1eec000 0 0 0 0000204 [SLP]ktsusp 0xc693ed5c] syncer 53 c693ee20 e1eed000 0 0 0 0000204 [SLP]ktsusp 0xc693ef20] bufdaemon 52 c699a000 e5f03000 0 0 0 000020c [SLP]pgzero 0xc0977808] pagezero 51 c699a1c4 e5f04000 0 0 0 0000204 [SLP]psleep 0xc0977860] vmdaemon 50 c699a388 e5f05000 0 0 0 0000204 [SLP]psleep 0xc097784c] pagedaemon 49 c699a54c e5f06000 0 0 0 0000204 [IWAIT] swi0: tty:sio 48 c699a710 e5f07000 0 0 0 0000204 [SLP]idle 0xc694d800] aic_recovery0 9 c699a8d4 e5f50000 0 0 0 0000204 [SLP]idle 0xc694d800] aic_recovery0 47 c68a3a98 e1eb5000 0 0 0 0000204 [SLP]usbevt 0xc697d210] usb1 46 c68a3c5c e1eb6000 0 0 0 0000204 [SLP]usbtsk 0xc093a78c] usbtask 45 c68a3e20 e1eb7000 0 0 0 0000204 [SLP]usbevt 0xc6940210] usb0 8 c693a000 e1eb8000 0 0 0 0000204 [SLP]actask 0xc0a9f90c] acpi_task2 7 c693a1c4 e1eb9000 0 0 0 0000204 [SLP]actask 0xc0a9f90c] acpi_task1 6 c693a388 e1eba000 0 0 0 0000204 [SLP]actask 0xc0a9f90c] acpi_task0 44 c693a54c e1ebb000 0 0 0 0000204 [IWAIT] swi3: cambio 43 c693a710 e1ebc000 0 0 0 0000204 new [IWAIT] swi2: camnet 42 c693a8d4 e1ebd000 0 0 0 0000204 [IWAIT] swi7: acpitaskq 41 c693aa98 e1ebe000 0 0 0 0000204 new [IWAIT] swi5:+ 5 c693ac5c e1ee3000 0 0 0 0000204 [SLP]tqthr 0xc0943c28] taskqueue 40 c693ae20 e1ee4000 0 0 0 0000204 new [IWAIT] swi6:+ 39 c693e000 e1ee5000 0 0 0 0000204 new [IWAIT] swi7: task queue 38 c689254c e1e85000 0 0 0 0000204 [SLP]- 0xc09383a0] random 4 c6892710 e1e86000 0 0 0 0000204 [SLP]- 0xc093cde0] g_down 3 c68928d4 e1e87000 0 0 0 0000204 [SLP]- 0xc093cddc] g_up 2 c6892a98 e1e88000 0 0 0 0000204 [SLP]- 0xc093cdd4] g_event 37 c6892c5c e1e89000 0 0 0 0000204 new [IWAIT] swi4: vm 36 c6892e20 e1e8a000 0 0 0 000020c [IWAIT] swi8: tty:sio clock 35 c68a3000 e1e8b000 0 0 0 0000204 [IWAIT] swi1: net 34 c68a31c4 e1eb0000 0 0 0 0000204 new [IWAIT] irq0: clk 33 c68a3388 e1eb1000 0 0 0 0000204 new [IWAIT] irq23: 32 c68a354c e1eb2000 0 0 0 0000204 new [IWAIT] irq22: 31 c68a3710 e1eb3000 0 0 0 0000204 new [IWAIT] irq21: 30 c68a38d4 e1eb4000 0 0 0 0000204 new [IWAIT] irq20: 29 c29b01c4 e0454000 0 0 0 0000204 new [IWAIT] irq19: 28 c29b0388 e0455000 0 0 0 0000204 [IWAIT] irq18: fxp0 27 c29b054c e0456000 0 0 0 0000204 new [IWAIT] irq17: 26 c29b0710 e0457000 0 0 0 0000204 [IWAIT] irq16: ahc0 25 c29b08d4 e047c000 0 0 0 0000204 [IWAIT] irq15: ata1 24 c29b0a98 e047d000 0 0 0 0000204 new [IWAIT] irq14: ata0 23 c29b0c5c e047e000 0 0 0 0000204 new [IWAIT] irq13: 22 c29b0e20 e047f000 0 0 0 0000204 new [IWAIT] irq12: psm0 21 c6892000 e1e82000 0 0 0 0000204 new [IWAIT] irq11: 20 c68921c4 e1e83000 0 0 0 0000204 new [IWAIT] irq10: uhci0 uhci1 19 c6892388 e1e84000 0 0 0 0000204 new [IWAIT] irq9: acpi0 18 c29a7000 e0402000 0 0 0 0000204 new [IWAIT] irq8: rtc 17 c29a71c4 e044b000 0 0 0 0000204 new [IWAIT] irq7: ppc0 16 c29a7388 e044c000 0 0 0 0000204 [IWAIT] irq6: fdc0 15 c29a754c e044d000 0 0 0 0000204 new [IWAIT] irq5: 14 c29a7710 e044e000 0 0 0 0000204 new [IWAIT] irq4: sio0 13 c29a78d4 e044f000 0 0 0 0000204 new [IWAIT] irq3: sio1 12 c29a7a98 e0450000 0 0 0 0000204 [IWAIT] irq1: atkbd0 11 c29a7c5c e0451000 0 0 0 000020c [CPU 0] idle: cpu0 1 c29a7e20 e0452000 0 0 1 0004200 [SLP]thtrm 0xc694f240] init 10 c29b0000 e0453000 0 0 0 0000204 [CV]ktrace 0xc09403c4] ktrace 0 c093cf40 c0c1f000 0 0 0 0000200 [SLP]sched 0xc093cf40] swapper One thing I just noticed is that I'm not seeing the "Shutting down ACPI" message, which I think usually gets printed after the "Uptime" message. The ACPI message gets printed by acpi_shutdown_final(), so we're not getting all the way through the shutdown_final handlers, or maybe we're not even getting that far. I see that init is sleeping on thtrm, which happens in aic_terminate_recovery_thread(), which gets called from ahc_shutdown(). It looks aic_terminate_recovery_thread() returns immediately if aic->platform_data->recovery_thread is NULL, otherwise it sets the AIC_SHUTDOWN_RECOVERY, calls wakeup(), and then calls tsleep(). The wakeup appears to wake the recovery thread, which appears to loop forever in aic_recovery_thread() until the AIC_SHUTDOWN_RECOVERY flag is set, at which time it should wake aic_terminate_recovery_thread() and exit. Something that looks odd to me is that it appears that aic_terminate_recovery_thread() can be called twice in at least one case. In ahc_free(), ahc_terminate_recovery_thread() is unconditionaly called, and then depending on the init level, ahc_shutdown() may be called. When the recovery thread is killed, aic->platform_data->recovery_thread does not get cleared, so if the first call to ahc_terminate_recovery_thread() kills the recovery thread, the second call to ahc_terminate_recovery_thread() (from ahc_shutdown()) will hang because the recovery thread will not be around to wake it up the second time. It looks like either ahc_terminate_recovery_thread() or aic_recovery_thread() should clear aic->platform_data->recovery_thread when the thread exits.