Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 May 1997 20:14:42 +0400
From:      "Mikhail A. Sokolov" <mishania@demos.su>
To:        "Mikhail A. Sokolov" <mishania@demos.su>
Cc:        hackers@FreeBSD.ORG, isp@FreeBSD.ORG
Subject:   Re: strange 2.2.1 behaviour.
Message-ID:  <19970504201442.11618@skraldespand.demos.su>
References:  <19970504013700.25396@skraldespand.demos.su>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 04, 1997 at 01:37:01AM +0400, Mikhail A. Sokolov wrote:
> Hello, 

Hi, 

I've got variety of replies, pointing to a) NFS, b) aic*.c, and c) nonstandard
tags usage. What we have here now:
a) NFS was turtned on this boxes only several days ago for testing purposes 
only, so we have boxes running with huge network load. Their behaviour was 
exactly the same when they had no NFS, but have had been some kind of shell
machines.
b) Now it's 2.2-970422-RELENG what they are running, all of them. I.e. aic
driver is the most last available.
c) We played with those in all possible configurations, having SCB enabled, 
having it disabled, etc. They, mh/sb/gk have kernel.GENERIC loaded, - 
same result. The only box we know one should never use MEMIO for sure is
ASUS P/I-P6NP5 based and HP5/166/VL4, it's vendors info.


-mishania


> 
> there's one problem I would dare to disturb you, people. 
> Let's take 4 machines, as described below, 2 HP, 2 something (selfmade 
> rack industrial PC). They all reboot themselves without warnings since became
> 2.2.1. Let me explain, they all are heavily loaded  servers, with 100mbitx2 
> connection, and I assume it'd be better to explain each of them in particular:
> 
> MH
>     model HP 6/200 VA
>     P6-200
>     chipset Intel "Natoma"
>     128MB EDO RAM
>     adaptec 3949UW  (TAG and SCB enabled)
>     seagate ST32151W
>     Intel EtherExpress Pro 10/100B  (two, 100Mb full duplex)
>     nfs client
>     network activity is 200-400in/200-400out 1k packets/sec
>  
> The *&^&^   crashes each  5-30 min with the following reason: 
> Trap 12 : fault while in kernel mode ... virtual page adress 0x0 page not present , - that's rare ocasions this shy box escape's a yell like that, ussualy it'd 
> just crash down.
> 
> GK
>     model HP 5/166 VL series 4
>     P5-166
>     chipset Intel 82437FX
>     128MB RAM
>     adaptec 2949UW  (aic7880, TAG and  SCB enabled)
>     seagate ST32550W
>     Intel EtherExpress Pro 10/100B  (one, 100Mb full duplex)
>     nfs client
>     network activity is also some kind of 200-400in/200-400out packets/sec
>     crashes every 5-30 minutes.
> 
> This one never let society know, why is it willing to crash. 
> 
> SB
>     asus P/I-P6NP5
>     P6-200
>     chipset Intel "Natoma"
>     128MB RAM
>     adaptec 3949UW  (TAG and  SCB enabled)
>     seagate ST32151W and ST19171W
>     Intel EtherExpress Pro 10/100B  (two, 100Mb full duplex)
>     nfs server
>     network activity 500-1000in/500-1000out packets/sec
>     crashed once 24-48 hr
> 
> Here, it's silent also, but is definetely more loaded and is more stable for
> some unknown reasons.  Of course I know HP sucks (pardon, but it does), but
> ASUS motherboarded machines definetely seems to be more stable than any HP
> made PC. Anyhow, There's another one, selfmade also, ASUS ppro200x2/Natoma/256 
> RAM and 3x3940 adaptecs, 10 disks (2x9gb and 8x4gb seagates) plus 2 fxp 
> intel cards. It already reboots once per ~week, but without _any_ notice. 
> This one is the most loaded, handling huge ftp server, proxy server etc.
> 
> The most interesting part is that hardware is _not_ culprit in this situations,
> we changed memory in boxes, disks, ethernet's (tried de0's by SMC), even power 
> supplies.  They all are double UPS'd, all supplies have enough power to feed 
> that iron pieces, but still, reboots happen. 
> 
> When we investigated what's wrong, we tried to correlate their reboots with a)
> high disk activities, b) network activities, c) network situation changes. We
> got: 
> a) has nothing to do with situation,  since both ppro200's handle use disk more
> than others, and the last one, unnamed, serves 10 disk easylly, still crahes a 
> less than others. 
> 
> b) should be the culpit here, - MH and GK boxes were made to exec looped find's 
> -exec ls -alRt (etc) over 100mbit full duplex NFS v 3.0 (tested both, TCP and 
> UDP variants) on disk, mounted to SB, and here, - MH and GK crash in 10/20/30
> minutes, still the server stands still, plus serving 40/60 clients
>  simultaneously (that gives 200-300 processes, a la sh/slirp). 
> That is odd, but when you unplugg boxes from network, they do ok for weeks
> (tested).
> 
> c) we tried to correlate sb's crashes with arp info changes by arp proxy by
> nearby standing cisco (4500/IOS 10.3), - tough luck. Tried to correlate virtual
> inerfaces quantity increasing on SB (now it's ~130) with it's reboots, no luck 
> here also. 
> 
> Now we totalaly misunderstand  what is going on, what can it be and why, this
> boxes don't run anything than well known software, like squid, ircd, slirpd and
> alike things. 
> 
> 
> Sorry for complicated explanation, 
> 
> Sincerely yours, 
> 
> Mikhail A. Sokolov.
> 






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19970504201442.11618>