Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 May 2011 16:13:20 -0500
From:      Mike Karels <mike@karels.net>
To:        freebsd-amd64@freebsd.org
Cc:        mike_karels@mcafee.com
Subject:   variable hang when starting APs on Westmere processors
Message-ID:  <201105022113.p42LDLrl051285@mail.karels.net>

next in thread | raw e-mail | index | archive | help
Looks like freebsd-smp is gone... not sure of the right target for this.

I just picked up a problem from another developer at work who had the good
fortune to have scheduled a vacation this week.  The short description is
that the start_ap() routine sometimes hangs, from 10 minutes to 3 hours,
while starting up CPUs.  This is with a much-modified system based on
FreeBSD 7.2.  A stock 8.2 CD hangs at the same spot almost all the time,
although the code in the two versions appears identical.

More details:  This is amd64, using an Intel S5520HCR 2-socket motherboard
with two XEON X5660 2.8GHz Westmere hex-core CPUs.  The problem happens
somewhat less with two XEON E5620 Quad core 2.4GHz CPUs.  The hang seems
to happen with higher numbered CPUs, so the hex-core with SMT has more
chances to hit the problem.

We added KTRs to the code, and found that the hang happens in the
lapic_ipi_wait() call after de-asserting RESET.

Of course, Linux doesn't exhibit the problem.

Has anyone else seen a problem like this?  Any ideas how to fix it, or
debug further?

Please copy me on responses; I'm not subscribed to this list currently.

		Mike



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105022113.p42LDLrl051285>