From owner-freebsd-hackers  Sun Feb 18 12:35:36 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id MAA02484
          for hackers-outgoing; Sun, 18 Feb 1996 12:35:36 -0800 (PST)
Received: from hauki.clinet.fi (root@hauki.clinet.fi [194.100.0.1])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id MAA02479
          for <freebsd-hackers@freebsd.org>; Sun, 18 Feb 1996 12:35:27 -0800 (PST)
Received: from newzetor.clinet.fi (root@newzetor.clinet.fi [194.100.0.11]) by hauki.clinet.fi (8.7.3/8.6.4) with ESMTP id WAA27161; Sun, 18 Feb 1996 22:35:14 +0200 (EET)
Received: (hsu@localhost) by newzetor.clinet.fi (8.7.3/8.6.4) id WAA00993; Sun, 18 Feb 1996 22:35:03 +0200 (EET)
Date: Sun, 18 Feb 1996 22:35:03 +0200 (EET)
Message-Id: <199602182035.WAA00993@newzetor.clinet.fi>
From: Heikki Suonsivu <hsu@clinet.fi>
To: Arjan.deVet@adv.IAEhv.nl (Arjan de Vet)
Cc: freebsd-hackers@freebsd.org
In-reply-to: Arjan.deVet@adv.IAEhv.nl's message of 18 Feb 1996 00:22:03 +0200
Subject: Re: Web server locks up... but not quite. (?)
Organization: Clinet Ltd, Espoo, Finland
References: <199602172208.XAA07502@adv.IAEhv.nl>
Sender: owner-hackers@freebsd.org
Precedence: bulk


In article <199602172208.XAA07502@adv.IAEhv.nl> Arjan.deVet@adv.IAEhv.nl (Arjan de Vet) writes:
   In article <Pine.BSF.3.91.960216213633.12191H-100000@zip.io.org> you write:
   >    This sort of thing has happened before with other 2.1.0-R machines
   >here, but tonight was the first time I was able to get to the console
   >of one before someone else rebooted it.

   >    You could telnet to various ports on it (indicating that inetd was
   >still bound to those ports), but none of the services normally
   >attached to those ports would run, including internal ones like
   >chargen or daytime (indicating that inetd was blocked in some way).
   >It wasn't fielding RPC requests either.  The login prompt was still
   >displayed on all the virtual consoles (I was still able to switch
   >between them), but there was no response from the keyboard, as if the
   >getty's had died off.  The only sign of life was that it was returning
   >pings from another machine.
   [...]

This has been happening with -current all the time.  It also happened when
we tried to run -stable at the time 2.1R was released.

   >    Since there is no indication as to the source of the hang, is
   >there anything I can run periodically from cron to help track down the
   >problem?  I can start tracking load averages, swap space usage, the
   >output of vmstat, netstat, iostat and nfsstat if that will help.  Any
   >suggestions?

   We have seen exactly these symptoms too. At one moment our main ISP
   machine (2.0.5) hung almost every night between 2:06h and 2:07h. We had
   all kinds of programs running from cron like the ones you suggest but we
   could not find anything strange. But because it always happened around
   2:07h when /etc/daily was running we moved /etc/daily from 02:00h to
   10:00h (when there's always somebody near the machine to reboot it) and
   the nightly hangs disappeared. They happen once in while now, around 10:07
   :-((.  But the real problem has not been found yet...

We get these at random intervals, though the more load the more often it
happens.  Frequency for the news/user server is once per day, for user only
server about twice a week, and less pounded servers once per a couple of
weeks.  Dedicated servers don't seem to have much trouble.  Routers don't
seem to crash at all.

I don't think this is hardware, it happens on all machines, and we have
quite random collection of hardware.  Mostly ASUS with random inserts of
MSI or Intel motherboards and lots of various 386/486 motherboards.  SCSI
systems seem to be more prone to deadlocking but this does not necessarily
mean anything as all loaded machines are generally with SCSI.

I guess I would need to hire a full-time FreeBSD hacker to be able to run
FreeBSD in this scale :-(

-- 
Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND, hsu@clinet.fi
mobile +358-40-5519679 work +358-0-4375360 fax -4555276 home -8031121