Date: Thu, 30 Jan 1997 12:34:19 -0700 From: Steve Passe <smp@csn.net> To: bag@sinbin.demos.su (Alex G. Bulushev) Cc: mishania@demos.su, freebsd-smp@freebsd.org Subject: Re: troubles with smp kernel Message-ID: <199701301934.MAA18162@clem.systemsix.com> In-Reply-To: Your message of "Thu, 30 Jan 1997 21:00:48 %2B0300." <199701301800.VAA18189@sinbin.demos.su>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, ( note that I am answering 3 consecutive mailings in this response. I think I have identified the problem, but don't get to that till the end, so read on... ) --- >> > first, you should be using a kernel with options APIC_IO and options >> > SMP_INVLTBL, although I doubt that is the cause of your problem. > ^^^ TLB sorry... >It works! but reboots once a hour :( does it reboot EXACTLY once every hour, or just approximatelty than often. What I am asking is if it might be associated with some job being run by cron, etc. --- >static electricity killed mishania and now there is no PARITY ERROR !! >why ? in america we have a saying: don't look a gift horse in the mouth. I guess the translation is, it works now, so don't complain! Seriously speaking, this all points to hardware problems of some sort. Might be something as simple as a loose SIMM. I would powerdown the machine, reseat the SIMMS, pull and re-insert all cards, including the CPU card and CPUs. Then do something about static control, spray static control on the carpet, or whatever, static can destroy a machine!!! You might try another location, perhaps thers's a bad electrical socket there (bad ground leg, electrical noise, etc.) Make sure the box is on a surge/noise outlet strip or UPS. --- >Fine Manual says we should leave JP5 to handle PIIX3 SMI, never turning APIC ON. >We turned it on, of course, and it works only from then. manuals for these things are often misleading or incorrect. Unfortunately you often have to "read between the lines" or even disbelieve everything they say and just experiment. It looks like you have found the right combination. --- >Seems like it was my letter, but I didn't include mptable output then, here we >all have it. But, I see it lies, - I _have_ APIC_IO uncommented ... I'm not sure I understand, if you mean that you ran mptable with a kernel that has APIC_IO enabled, but you got the mptable output that was missing the INT section, this is explainable. You need to understand that the information provided by mptable is just gotten from what the BIOS provides, it has nothing to do with which kernel is running. You can run mptable from a non SMP kernel and get the same results. What affects it is the position of motherboard jumpers and BIOS settings. Think of mptable as a tool for getting all these things setup properly. --- >MPTable, version 2.0.4 > ... >Processors: APIC ID Version State Family Model Step Flags > 1 0x11 BSP, usable 6 1 6 0xfbff > 0 0x11 AP, usable 6 1 7 0xfbff ^ you had said earlier that you added an "identical" processor from another machine, but this shows that they are a different stepping. This may or may not be a problem (one being stepping 6, the other being stepping 7). The safest thing would be to try to find 2 of the same stepping, but don't worry too much if you can't.... the rest of the table looks good on first glance... --- >options SMP_INVLTBL # Steven. ^^^ this is my fault, proper spelling is: options SMP_INVLTLB I would suggest you grab the latest mptable from the web page (2.0.6 I think) it will have these newer options listed in its output. --- >> is this area really missing or did you truncate the output? there should be >> a long list of INTerrupt associations here!!! > >this is a real output with JP5 default setings (PIIX3 SMI) > >now mptable output for JP5 in APIC SMI position: > ... obviously the manual is WRONG! --- note that the following lines are grabbed from several of the previous mailings, resorted to explain the issue: >Bus: Bus ID Type > 0 PCI > 1 PCI > 2 PCI > 3 ISA this shows the PCI bus on the motherboard (Bus 0) and the PCI busses created by the PCI bridge chips on each of the 3940s (Bus 1 & Bus 2) This is correctly done, by the way, and many SMP motherboards blow it entirely. >I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID INT# > INT active-lo level 1 4:A 2 19 > INT active-lo level 1 5:A 2 16 > INT active-lo level 0 10:A 2 18 > INT active-lo level 2 4:A 2 16 > INT active-lo level 2 5:A 2 17 >ahc0 <Adaptec 3940 Ultra SCSI host adapter> rev 0 int a irq 19 on pci1:4 >ahc1 <Adaptec 3940 Ultra SCSI host adapter> rev 0 int a irq 16 on pci1:5 >ahc2 <Adaptec 3940 Ultra SCSI host adapter> rev 0 int a irq 19 on pci2:4 >ahc3 <Adaptec 3940 Ultra SCSI host adapter> rev 0 int a irq 16 on pci2:5 ^^ || here is your major problem, ahc2 and ahc3 are getting the wrong INTs assigned to them. ahc2 should get IRQ16, and ahc3 should get IRQ17 A little history to explain why the current code is failing: The original MP spec 1.1 didn't take PCI bridge cards into account and thus couldn't handle them. Intel then added appendix D.2/3 to the spec which attempted to clear this up, but many MBs didn't get it right. Beyond that it was unclear to me from the spec exactly how the code should deal with it till I had a chance to work it thru with several people who actual had this type of hardware. As a result the current code ignores the Bus ID when assigning these INTs. The simple solution here would be to run without the 2nd 3940. The first one is being properly assigned. However, since your MB (ASUS) does the mp table correctly I suggest the better alternative: You could attempt to fix the code in sys/i386/i386/mp_machdep.c. The following patch hopefully will work, but I don't have an SMP machine right now so I could not test it... let me know if it works. -------------------------------------- cut --------------------------------- *** mp_machdep.c.old Thu Dec 12 01:43:52 1996 --- mp_machdep.c Thu Jan 30 12:07:38 1997 *************** *** 917,926 **** /* * determine which APIC pin a PCI INT is attached to. */ #define SRCBUSDEVICE(I) ((ioApicINTs[(I)].srcBusIRQ >> 2) & 0x1f) #define SRCBUSLINE(I) (ioApicINTs[(I)].srcBusIRQ & 0x03) int ! get_pci_apic_irq( int pciBus __attribute__ ((unused)), int pciDevice, int pciInt ) { /** --- 917,927 ---- /* * determine which APIC pin a PCI INT is attached to. */ + #define SRCBUSID(I) (ioApicINTs[(I)].srcBusID) #define SRCBUSDEVICE(I) ((ioApicINTs[(I)].srcBusIRQ >> 2) & 0x1f) #define SRCBUSLINE(I) (ioApicINTs[(I)].srcBusIRQ & 0x03) int ! get_pci_apic_irq( int pciBus, int pciDevice, int pciInt ) { /** *************** *** 932,937 **** --- 933,939 ---- for ( intr = 0; intr < nintrs; ++intr ) /* search each record */ if ( (INTTYPE( intr ) == 0) + && (SRCBUSID( intr ) == pciBus) && (SRCBUSDEVICE( intr ) == pciDevice) && (SRCBUSLINE( intr ) == pciInt) ) /* a candidate IRQ */ if ( apicIntIsBusType( intr, PCI ) ) /* check bus match */ *************** *** 941,946 **** --- 943,949 ---- } #undef SRCBUSLINE #undef SRCBUSDEVICE + #undef SRCBUSID #undef INTPIN #undef INTTYPE -------------------------------------- cut --------------------------------- I expect the above to make things much better, assumming you were using devices on the 2nd 3940. Note that the above patch will actually cause many motherboards to STOP working because they don't do the mp table stuff correctly! This is why I haven't submitted such a change to the code. The real fix is going to involve analyzing the mp table, then making a CORRECTED in-core copy when the kernel boots. It ain't gonna be pretty, and it ain't gonna be easy to get right, so I have been avoiding it!!! -- Steve Passe | powered by smp@csn.net | FreeBSD -----BEGIN PGP PUBLIC KEY BLOCK----- Version: 2.6.2 mQCNAzHe7tEAAAEEAM274wAEEdP+grIrV6UtBt54FB5ufifFRA5ujzflrvlF8aoE 04it5BsUPFi3jJLfvOQeydbegexspPXL6kUejYt2OeptHuroIVW5+y2M2naTwqtX WVGeBP6s2q/fPPAS+g+sNZCpVBTbuinKa/C4Q6HJ++M9AyzIq5EuvO0a8Rr9AAUR tBlTdGV2ZSBQYXNzZSA8c21wQGNzbi5uZXQ+ =ds99 -----END PGP PUBLIC KEY BLOCK-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701301934.MAA18162>