Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jan 1997 11:08:29 -0700
From:      Steve Passe <smp@csn.net>
To:        mishania@demos.su
Cc:        bag@bag.ru, freebsd-smp@freebsd.org
Subject:   Re: troubles with smp kernel 
Message-ID:  <199701301808.LAA17738@clem.systemsix.com>
In-Reply-To: Your message of "Thu, 30 Jan 1997 20:27:46 %2B0300." <199701301727.UAA19227@megillah.demos.su> 

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

>> first, you should be using a kernel with options APIC_IO and options
>> SMP_INVLTBL, although I doubt that is the cause of your problem. 
>
>Thanks fot the hint, I recompiled the beasts kernel with APIC_IO/SMP_INVLTBL,
>but I still wonder about the  following: SMP_INVLTBL, APIC_IO, where can
>description be found ?  SMP_INVLTBL doesn't even seem to be announced at
>www.freebsd.org/~fsmp/SMP..

it isn't, the problem is one of resources.  as you know we're all volunteers,
doing this in our "free time", of which there usually isn't much!  So we can
either write code or document code, you know what we will choose....

basically SMP_INVLTBL is code that insures that other CPUs invalidate their
TBL tables when virtual memory changes require it.  Its a recent addition
and thus not documented.  A search of the SfreeBSD SMP mail archive
would pop up alot of discussion of it.

---

>Here, we currently have ASUS dual ppro mother with two ppro200's, 
>FM of the motherboard says, that APIC_IO should NOT be turned on, 
>until 'future upgrade', as Alex already mentioned. Of course, we tried turning
>it on, and it works only with it's ON ;-) But the machine still reboots
>not giving any clue to syslogd.  Thus I guess it is more issue for hardware@

I am missing something here, you say it reboots, but I see output in this
letter showing it running, specifically under what circumstances does it
reboot?

---
>list, you guys seem to be more experienced with MP motherboards, right? ;-)
>What we have is attached at the end of the letter of mine; to be short it's
>the above mentioned mother, 2x3940TUW (Twin Ultra Wide) Adaptecs in slots 
>4 and 5, sharing irq's. The FM of the motherboard claims, that it might not
>be any problem at all to have it shared, _when an OS supports sharing
>correct_. Seems it doesn't :-(. 

It does, and it does support the 3940 IF the motherboard knows how to handle
bridged PCI cards (the 3940 has a PCI bridge chip on it).  This motherboard
is know to properly support the 3940 if correctly setup.  Check your
BIOS for a setting that describes the MP spec level.  It will give you
a choice between version 1.1 and 1.4.  Set it to 1.4.  Running at
version 1.1 will cause the 3940s to fail miserably.  Again a search of the
SMP mail archive for 3940 should provide you with alot of info
on what we did to ensure that they work (and work with shared INTs).

---
I would be also interested in hints on RAM parity check this monster does: I
get "RAM PARITY SEGMENT CHECK FAILED in segment 0x0000, F1 to disable NMI,
F2 to reboot". Since I already tested many different SIMM's, 1 Gb of them ;-)
I can assert it not to be RAM physical problem, - but what than? This
problem arose _only_ after I plugged second identical processor, stolen
from HP Vectra's VA Series 4. I changed processors also, tested four, so 
they are not culprits.

This is indeed puzzling, and if it wasn't associated with plugging in the
second CPU I would say it has NOTHING to do with this list, but it does so...
I would guess that either there is a hardware problem with the motherboard
or that something is misconfigured in the BIOS.  Either way I think it is
a question for ASUS support.

---
>> you need to be much more specific in describing you problem for us to help.
>> what exactly is bytebench doing at this point?
>
>Returning to bytebench and Dhrystones in particular, machine reboots (see 
>whining above also) in process of several concurent shell scripts and on 
points 
>Alex already described also. Maybe that's the matter if incorrect IRQ's 
>sharing handling?

good possibility that something with INTs is causing problems.  As I stated
in last response, there is something very wrong with the contents
of the MP-table you sent.  there is that missing section, and the line:

options		NINTR=16		# number of INTs
                      ^^

when there are 3940 cards in the system this number should be different, the
bridge causes additional INT sources.  also there was the missing INT section
of the table I mentioned previously.  in summary, make sure the BIOS
is set for MP spec version 1.4, build a kernel with APIC_IO & SMP_INVLTLB,
boot it and run "mptable -dmesg -verbose", send us the results.

---
>Another little problem (?) here also, - interesting behaviour of SMP kernel on
>my halt: it yells something like "Oi, I am working on CPU #1, switching to #0!
>HALT!". Why is that?

nothing to worry about, its just saying that CPU #1 was the CPU that was
handed the job of shutting down the system (50-50 chance of this happening!)
and that it is stopping and letting CPU #0 do it.  This is necessary
to ensure an orderly shutdown, flushing of virtual memory, cache, etc.

---
SMP tree is as af today, fetched it from scratch, build is done on basis of 
3.0-19970124-SNAP, mother-fatherboard is Asus P/I-P65UP5/C-P6ND, BIOS set to MP
ver1.4.

OK, hadn't seen confirmation of ver1.4 b4, again give me the complete
mptable -dmesg output as requested above.

--
Steve Passe	| powered by
smp@csn.net	|            FreeBSD

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2

mQCNAzHe7tEAAAEEAM274wAEEdP+grIrV6UtBt54FB5ufifFRA5ujzflrvlF8aoE
04it5BsUPFi3jJLfvOQeydbegexspPXL6kUejYt2OeptHuroIVW5+y2M2naTwqtX
WVGeBP6s2q/fPPAS+g+sNZCpVBTbuinKa/C4Q6HJ++M9AyzIq5EuvO0a8Rr9AAUR
tBlTdGV2ZSBQYXNzZSA8c21wQGNzbi5uZXQ+
=ds99
-----END PGP PUBLIC KEY BLOCK-----




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701301808.LAA17738>