Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 07 Dec 1996 01:36:44 +0800
From:      Peter Wemm <peter@spinner.dialix.com>
To:        Erich Boleyn <erich@uruk.org>
Cc:        Steve Passe <smp@csn.net>, smp@freebsd.org
Subject:   Re: P6 and FreeBSD/SMP (was -> Re: last major problem) 
Message-ID:  <199612061736.BAA18860@spinner.DIALix.COM>
In-Reply-To: Your message of "Fri, 06 Dec 1996 08:30:17 PST." <E0vW3AH-0007kq-00@uruk.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Erich Boleyn wrote:
> 
> Steve Passe <smp@csn.net> writes:
> > > > so all 3 failing systems are P6.  is there anyone now successfully runn
    ing
> > > > APIC_IO on a P6 system?
> > > 
> > > Wait a sec, I thought this was a problem caused by the SMP_INVLTLB code?
> > > Or is it a generic "P6 dislikes APIC_IO in general" problem?
> > 
> > thats what I'm wondering, I've been so busy fighting the other fires that
> > I haven't kept up on the details of this and may misunderstand...
> > 
> > so additionally, has ANYONE EVER run a 'solid' APIC_IO kernel on a P6?
> 
> Hmmm...  I think there are several issues afoot here.
> 
>  (1) Does the P6 run an APIC_IO kernel OK before activating the other CPUs
>  (2) Does the P6 run an APIC_IO kernel OK after activating the other CPUs
> 
> ...and seen in another message...
> 
>  (3) [the thing about relying on the short prefect queue of the Pentium, and
>       the P6 might break it]
> 
> I ran a bunch of tests last night and this morning with the kernel tree
> from yesterday afternoon/evening.  My results were (using APIC_IO +
> SMP_INVLTLB) :
> 
>   --  After I activate the other 3 CPUs (via "sysctl -w kern.smp_active=4"),
>       standard (sinple) operations seem OK, but when I start compiling a
>       kernel, anywhere from 1/3 to 1/2 the way through it dies with a
>       "kernel trap 12: supervisor read/write, page not present" break to
>       the kernel debugger, going into pmap_enter.  It is always that error
>       (I think I've seen a "write" once, with it saying a "read" trap the
>       other times).  This implies that the answer to #2 is "no" (at least
>       on my test box).  I tried this about a dozen times, with the same
>       results each time.

Question: what are the offending faulting addresses?  We use two PTD slots,
one for accessing the "current" process (addresses 0xefc00000-0xeffeffff),
and the other being for an "alternate" process or address space (range:
0xffc00000 - 0xfffeffff).  These 4MB chunks are a sparse end-to-end set
of page table pages representing the 4GB of process address space.

In the one message that I have handy, something is very wrong.  The
"current" process has faulted on it's first stack page, and pmap_enter
is somehow using the alternate APTD stuff for some reason.  I do not
understand how this comes about yet.

>   --  If I don't activate the other CPUs, I can do a dozen builds in a raw
>       with no problems.  This implies the answwer to #1 is "yes".

How does it go without SMP_INVLTLB BTW?  Do you use a scsi or ide system?
I think we've pretty much discovered that the IDE driver is very vulnerable
to missing the invltlb calls on the alternate cpu's for some reason.

> This leads me to believe that the problem is in the MP handling, not the
> base APIC_IO stuff.  Maybe this is a good time to tell some of us what
> is different when activating the other CPUs as far as APIC control?

I am suspicious about the APTD handling.  I am wondering if we need to make
the handling of the APTD pointer per-cpu or something.  Then there's the
CADDR1, CADDR2 etc stuff.  There's plenty of potential targets to check
out, but we need some more clues to work on first.

> I've been digging through the source, and it seems that the "smp_invltlb"
> is separate from the normal "invltlb" function?  There are *many* places
> where "invltlb_1pg" (I think that was it) or other variants are called and
> no SMP invalidates are propagated.  This strikes me as a situation
> fraught with potential (and currently real) problems.  What is the
> design goal here?

smp_invltlb() is called from the invltlb() function.  When compiling
with SMP_INVLTLB, invltlb() is no longer inlined.. it's a called
function in mp_machdep.c

Yes, this is extreme overkill, probably 90% of those calls to smp_invltlb()
are unnecessary..  They should not be harmful, but once it's working we can
optimise it quite a lot.

> On a side note (issue #3 I saw a comment on), first of all, never, *ever*
> rely on short prefetch queues.  A proper sequence which flushes the queues
> in the appropriate places should be used.  The Intel manuals have many
> examples of this.  I'll take a look at the code sequence and see what's
> up with it.  It could be that the comment that was made about it was
> bogus.  (if not, then that could very well be a fatal problem.  Certainly
> the P6 and what I've heard of the next project will become more and more
> unpredictable unless the proper methods to serialize control register
> changes are used).

urk, sorry if I gave the impression that we were deliberately doing this..
No, it was an *accident* that the early smp code did this, it worked fine
on the P5, but failed on the P6.  I was thinking out aloud to the effect
of "Hmm, I wonder if there's somewhere else that this kind of thing is
happening that we're not aware of?".

> --
>   Erich Stefan Boleyn                 \_ E-mail (preferred):  <erich@uruk.org
    >
> Mad Genius wanna-be, CyberMuffin        \__      (finger me for other stats)
> Web:  http://www.uruk.org/~erich/     Motto: "I'll live forever or die trying
    "

Cheers,
-Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612061736.BAA18860>