From owner-freebsd-isp  Sat May  3 14:36:23 1997
Return-Path: <owner-isp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id OAA19094
          for isp-outgoing; Sat, 3 May 1997 14:36:23 -0700 (PDT)
Received: from kremvax.demos.su (kremvax.demos.su [194.87.0.20])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id OAA19051;
          Sat, 3 May 1997 14:36:16 -0700 (PDT)
Received: by kremvax.demos.su (8.6.13/D) from 0@skraldespand.demos.su [194.87.0.19]
          with ESMTP id BAA23759; Sun, 4 May 1997 01:35:58 +0400
Received: by skraldespand.demos.su id BAA12138;
  (8.8.5/D) Sun, 4 May 1997 01:37:01 +0400 (MSD)
Message-ID: <19970504013700.25396@skraldespand.demos.su>
Date: Sun, 4 May 1997 01:37:01 +0400
From: "Mikhail A. Sokolov" <mishania@demos.su>
To: hackers@freebsd.org
Cc: isp@freebsd.org
Subject: strange 2.2.1 behaviour.
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.65_p2,4-7,10-11,15,18,21-22
Organization: Demos Company, Ltd., Moscow, Russian Federation.
X-Point-of-View: Gravity is myth, - the earth sucks.
X-Om-Livet-Suger: Ja. Ja, ja.
Sender: owner-isp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Hello, 

there's one problem I would dare to disturb you, people. 
Let's take 4 machines, as described below, 2 HP, 2 something (selfmade 
rack industrial PC). They all reboot themselves without warnings since became
2.2.1. Let me explain, they all are heavily loaded  servers, with 100mbitx2 
connection, and I assume it'd be better to explain each of them in particular:

MH
    model HP 6/200 VA
    P6-200
    chipset Intel "Natoma"
    128MB EDO RAM
    adaptec 3949UW  (TAG and SCB enabled)
    seagate ST32151W
    Intel EtherExpress Pro 10/100B  (two, 100Mb full duplex)
    nfs client
    network activity is 200-400in/200-400out 1k packets/sec
 
The *&^&^   crashes each  5-30 min with the following reason: 
Trap 12 : fault while in kernel mode ... virtual page adress 0x0 page not present , - that's rare ocasions this shy box escape's a yell like that, ussualy it'd 
just crash down.

GK
    model HP 5/166 VL series 4
    P5-166
    chipset Intel 82437FX
    128MB RAM
    adaptec 2949UW  (aic7880, TAG and  SCB enabled)
    seagate ST32550W
    Intel EtherExpress Pro 10/100B  (one, 100Mb full duplex)
    nfs client
    network activity is also some kind of 200-400in/200-400out packets/sec
    crashes every 5-30 minutes.

This one never let society know, why is it willing to crash. 

SB
    asus P/I-P6NP5
    P6-200
    chipset Intel "Natoma"
    128MB RAM
    adaptec 3949UW  (TAG and  SCB enabled)
    seagate ST32151W and ST19171W
    Intel EtherExpress Pro 10/100B  (two, 100Mb full duplex)
    nfs server
    network activity 500-1000in/500-1000out packets/sec
    crashed once 24-48 hr

Here, it's silent also, but is definetely more loaded and is more stable for
some unknown reasons.  Of course I know HP sucks (pardon, but it does), but
ASUS motherboarded machines definetely seems to be more stable than any HP
made PC. Anyhow, There's another one, selfmade also, ASUS ppro200x2/Natoma/256 
RAM and 3x3940 adaptecs, 10 disks (2x9gb and 8x4gb seagates) plus 2 fxp 
intel cards. It already reboots once per ~week, but without _any_ notice. 
This one is the most loaded, handling huge ftp server, proxy server etc.

The most interesting part is that hardware is _not_ culprit in this situations,
we changed memory in boxes, disks, ethernet's (tried de0's by SMC), even power 
supplies.  They all are double UPS'd, all supplies have enough power to feed 
that iron pieces, but still, reboots happen. 

When we investigated what's wrong, we tried to correlate their reboots with a)
high disk activities, b) network activities, c) network situation changes. We
got: 
a) has nothing to do with situation,  since both ppro200's handle use disk more
than others, and the last one, unnamed, serves 10 disk easylly, still crahes a 
less than others. 

b) should be the culpit here, - MH and GK boxes were made to exec looped find's 
-exec ls -alRt (etc) over 100mbit full duplex NFS v 3.0 (tested both, TCP and 
UDP variants) on disk, mounted to SB, and here, - MH and GK crash in 10/20/30
minutes, still the server stands still, plus serving 40/60 clients
 simultaneously (that gives 200-300 processes, a la sh/slirp). 
That is odd, but when you unplugg boxes from network, they do ok for weeks
(tested).

c) we tried to correlate sb's crashes with arp info changes by arp proxy by
nearby standing cisco (4500/IOS 10.3), - tough luck. Tried to correlate virtual
inerfaces quantity increasing on SB (now it's ~130) with it's reboots, no luck 
here also. 

Now we totalaly misunderstand  what is going on, what can it be and why, this
boxes don't run anything than well known software, like squid, ircd, slirpd and
alike things. 


Sorry for complicated explanation, 

Sincerely yours, 

Mikhail A. Sokolov.

P.S. Please, all ideas are welcomed, maybe when they don't fit the list, mail it
here, - don't let bosses desicion happen, so that ftp.ru.freebsd.org will live 
on some Sun box :-(