Date: Sun, 4 May 1997 01:37:01 +0400 From: "Mikhail A. Sokolov" <mishania@demos.su> To: hackers@freebsd.org Cc: isp@freebsd.org Subject: strange 2.2.1 behaviour. Message-ID: <19970504013700.25396@skraldespand.demos.su>
next in thread | raw e-mail | index | archive | help
Hello,
there's one problem I would dare to disturb you, people.
Let's take 4 machines, as described below, 2 HP, 2 something (selfmade
rack industrial PC). They all reboot themselves without warnings since became
2.2.1. Let me explain, they all are heavily loaded servers, with 100mbitx2
connection, and I assume it'd be better to explain each of them in particular:
MH
model HP 6/200 VA
P6-200
chipset Intel "Natoma"
128MB EDO RAM
adaptec 3949UW (TAG and SCB enabled)
seagate ST32151W
Intel EtherExpress Pro 10/100B (two, 100Mb full duplex)
nfs client
network activity is 200-400in/200-400out 1k packets/sec
The *&^&^ crashes each 5-30 min with the following reason:
Trap 12 : fault while in kernel mode ... virtual page adress 0x0 page not present , - that's rare ocasions this shy box escape's a yell like that, ussualy it'd
just crash down.
GK
model HP 5/166 VL series 4
P5-166
chipset Intel 82437FX
128MB RAM
adaptec 2949UW (aic7880, TAG and SCB enabled)
seagate ST32550W
Intel EtherExpress Pro 10/100B (one, 100Mb full duplex)
nfs client
network activity is also some kind of 200-400in/200-400out packets/sec
crashes every 5-30 minutes.
This one never let society know, why is it willing to crash.
SB
asus P/I-P6NP5
P6-200
chipset Intel "Natoma"
128MB RAM
adaptec 3949UW (TAG and SCB enabled)
seagate ST32151W and ST19171W
Intel EtherExpress Pro 10/100B (two, 100Mb full duplex)
nfs server
network activity 500-1000in/500-1000out packets/sec
crashed once 24-48 hr
Here, it's silent also, but is definetely more loaded and is more stable for
some unknown reasons. Of course I know HP sucks (pardon, but it does), but
ASUS motherboarded machines definetely seems to be more stable than any HP
made PC. Anyhow, There's another one, selfmade also, ASUS ppro200x2/Natoma/256
RAM and 3x3940 adaptecs, 10 disks (2x9gb and 8x4gb seagates) plus 2 fxp
intel cards. It already reboots once per ~week, but without _any_ notice.
This one is the most loaded, handling huge ftp server, proxy server etc.
The most interesting part is that hardware is _not_ culprit in this situations,
we changed memory in boxes, disks, ethernet's (tried de0's by SMC), even power
supplies. They all are double UPS'd, all supplies have enough power to feed
that iron pieces, but still, reboots happen.
When we investigated what's wrong, we tried to correlate their reboots with a)
high disk activities, b) network activities, c) network situation changes. We
got:
a) has nothing to do with situation, since both ppro200's handle use disk more
than others, and the last one, unnamed, serves 10 disk easylly, still crahes a
less than others.
b) should be the culpit here, - MH and GK boxes were made to exec looped find's
-exec ls -alRt (etc) over 100mbit full duplex NFS v 3.0 (tested both, TCP and
UDP variants) on disk, mounted to SB, and here, - MH and GK crash in 10/20/30
minutes, still the server stands still, plus serving 40/60 clients
simultaneously (that gives 200-300 processes, a la sh/slirp).
That is odd, but when you unplugg boxes from network, they do ok for weeks
(tested).
c) we tried to correlate sb's crashes with arp info changes by arp proxy by
nearby standing cisco (4500/IOS 10.3), - tough luck. Tried to correlate virtual
inerfaces quantity increasing on SB (now it's ~130) with it's reboots, no luck
here also.
Now we totalaly misunderstand what is going on, what can it be and why, this
boxes don't run anything than well known software, like squid, ircd, slirpd and
alike things.
Sorry for complicated explanation,
Sincerely yours,
Mikhail A. Sokolov.
P.S. Please, all ideas are welcomed, maybe when they don't fit the list, mail it
here, - don't let bosses desicion happen, so that ftp.ru.freebsd.org will live
on some Sun box :-(
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19970504013700.25396>
