Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jun 2004 10:11:39 -0400
From:      Don Bowman <don@sandvine.com>
To:        'Bruce Evans' <bde@zeta.org.au>, Don Bowman <don@sandvine.com>
Cc:        "'current@freebsd.org'" <current@FreeBSD.org>
Subject:   RE: kernel trap 19 with interrupts disabled
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C85337051D8F5C@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
From: Bruce Evans [mailto:bde@zeta.org.au]
> ... NMI, output but no debugger, hang, patch to workaround ...

I have applied the patch, and will await the next hang.

Out of curiousity, why not use something like this, so the
timeout is fixed in time, rather than a #? I used the tsc here.

static int
my_stop_cpus(u_int map)
{   
    unsigned long long end_ts = rdtsc() +
                                1ULL * tsc_freq;
    /* send the Xcpustop IPI to all CPUs in map */
    selected_apic_ipi(map, XCPUSTOP_OFFSET, APIC_DELMODE_FIXED);
    while ((stopped_cpus & map) != map)
    {  
       /* Wait 1 second */
       if ( rdtsc() > end_ts )
           return 0;
    }
    return 1;
}

Has anyone else been observing system hangs with
SMP Xeon (P4-based Xeon)? I have been observing this
for more than a year with 4.7. We came up with a workaround
by having a periodic NMI from the perfmon registers,
and having it check for hardclock still incrementing.
The problem we found is that hardclock would stop.
I was hoping it was a race condition in the stable
kernel, but now that i see what is most likely the
same issue on current, i'm starting to wonder. I have
a dual p3 system which has never experienced this problem.

--don



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C85337051D8F5C>