Date: Fri, 02 Nov 2007 08:31:14 +1000 From: Glen <glen.leeder@nokia.com> To: ACPI mailing list <freebsd-acpi@freebsd.org> Subject: SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous) Message-ID: <472A53B2.6030901@nokia.com>
next in thread | raw e-mail | index | archive | help
Hi, I have been seeing intermittent hangs in the acpi shutdown code on a Intel 2.4GHz 8 CPU system. I am running a with a Freebsd6.1 code base but cannot see a reason why this can't happen in other Freebsd versions. The hang is very irregular, I am recreating it using an expect script that repeatedly reboots the system. Sometimes, I can do up to 200 reboots before observing the hang, sometimes, it happens after 5-20 reboots. It has been difficult to pin down the hang as the system is not responding to NMI events but using breakpoints I believe the hang is in acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous. My theory is that one of the CPUs does not respond to ipi_all_but_self and that all the other CPUs are waiting for it in smp_rendezvous_action. The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system hangs. This maybe happen due to other activity (or a deadlock?) on that CPU. I noticed a few threads relating to this and have already tried stuff like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt. Neither had any effect. 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and this stops the hang from happening. Does anyone know the purpose of this call in the shutdown code or if I might suffer some consequence by removing it? 2) Has anyone got any suggestions for debugging this further given that I can't break into the debugger? I thought I could maybe instrument some counters in i386/i386/local_apic.c & kern_smp.c with the aim of identifying a root cause. Glen
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472A53B2.6030901>