Date: Sat, 07 Dec 1996 01:36:44 +0800 From: Peter Wemm <peter@spinner.dialix.com> To: Erich Boleyn <erich@uruk.org> Cc: Steve Passe <smp@csn.net>, smp@freebsd.org Subject: Re: P6 and FreeBSD/SMP (was -> Re: last major problem) Message-ID: <199612061736.BAA18860@spinner.DIALix.COM> In-Reply-To: Your message of "Fri, 06 Dec 1996 08:30:17 PST." <E0vW3AH-0007kq-00@uruk.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Erich Boleyn wrote: > > Steve Passe <smp@csn.net> writes: > > > > so all 3 failing systems are P6. is there anyone now successfully runn ing > > > > APIC_IO on a P6 system? > > > > > > Wait a sec, I thought this was a problem caused by the SMP_INVLTLB code? > > > Or is it a generic "P6 dislikes APIC_IO in general" problem? > > > > thats what I'm wondering, I've been so busy fighting the other fires that > > I haven't kept up on the details of this and may misunderstand... > > > > so additionally, has ANYONE EVER run a 'solid' APIC_IO kernel on a P6? > > Hmmm... I think there are several issues afoot here. > > (1) Does the P6 run an APIC_IO kernel OK before activating the other CPUs > (2) Does the P6 run an APIC_IO kernel OK after activating the other CPUs > > ...and seen in another message... > > (3) [the thing about relying on the short prefect queue of the Pentium, and > the P6 might break it] > > I ran a bunch of tests last night and this morning with the kernel tree > from yesterday afternoon/evening. My results were (using APIC_IO + > SMP_INVLTLB) : > > -- After I activate the other 3 CPUs (via "sysctl -w kern.smp_active=4"), > standard (sinple) operations seem OK, but when I start compiling a > kernel, anywhere from 1/3 to 1/2 the way through it dies with a > "kernel trap 12: supervisor read/write, page not present" break to > the kernel debugger, going into pmap_enter. It is always that error > (I think I've seen a "write" once, with it saying a "read" trap the > other times). This implies that the answer to #2 is "no" (at least > on my test box). I tried this about a dozen times, with the same > results each time. Question: what are the offending faulting addresses? We use two PTD slots, one for accessing the "current" process (addresses 0xefc00000-0xeffeffff), and the other being for an "alternate" process or address space (range: 0xffc00000 - 0xfffeffff). These 4MB chunks are a sparse end-to-end set of page table pages representing the 4GB of process address space. In the one message that I have handy, something is very wrong. The "current" process has faulted on it's first stack page, and pmap_enter is somehow using the alternate APTD stuff for some reason. I do not understand how this comes about yet. > -- If I don't activate the other CPUs, I can do a dozen builds in a raw > with no problems. This implies the answwer to #1 is "yes". How does it go without SMP_INVLTLB BTW? Do you use a scsi or ide system? I think we've pretty much discovered that the IDE driver is very vulnerable to missing the invltlb calls on the alternate cpu's for some reason. > This leads me to believe that the problem is in the MP handling, not the > base APIC_IO stuff. Maybe this is a good time to tell some of us what > is different when activating the other CPUs as far as APIC control? I am suspicious about the APTD handling. I am wondering if we need to make the handling of the APTD pointer per-cpu or something. Then there's the CADDR1, CADDR2 etc stuff. There's plenty of potential targets to check out, but we need some more clues to work on first. > I've been digging through the source, and it seems that the "smp_invltlb" > is separate from the normal "invltlb" function? There are *many* places > where "invltlb_1pg" (I think that was it) or other variants are called and > no SMP invalidates are propagated. This strikes me as a situation > fraught with potential (and currently real) problems. What is the > design goal here? smp_invltlb() is called from the invltlb() function. When compiling with SMP_INVLTLB, invltlb() is no longer inlined.. it's a called function in mp_machdep.c Yes, this is extreme overkill, probably 90% of those calls to smp_invltlb() are unnecessary.. They should not be harmful, but once it's working we can optimise it quite a lot. > On a side note (issue #3 I saw a comment on), first of all, never, *ever* > rely on short prefetch queues. A proper sequence which flushes the queues > in the appropriate places should be used. The Intel manuals have many > examples of this. I'll take a look at the code sequence and see what's > up with it. It could be that the comment that was made about it was > bogus. (if not, then that could very well be a fatal problem. Certainly > the P6 and what I've heard of the next project will become more and more > unpredictable unless the proper methods to serialize control register > changes are used). urk, sorry if I gave the impression that we were deliberately doing this.. No, it was an *accident* that the early smp code did this, it worked fine on the P5, but failed on the P6. I was thinking out aloud to the effect of "Hmm, I wonder if there's somewhere else that this kind of thing is happening that we're not aware of?". > -- > Erich Stefan Boleyn \_ E-mail (preferred): <erich@uruk.org > > Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) > Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying " Cheers, -Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612061736.BAA18860>