From owner-freebsd-hackers Sun Feb 18 12:35:36 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id MAA02484 for hackers-outgoing; Sun, 18 Feb 1996 12:35:36 -0800 (PST) Received: from hauki.clinet.fi (root@hauki.clinet.fi [194.100.0.1]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id MAA02479 for ; Sun, 18 Feb 1996 12:35:27 -0800 (PST) Received: from newzetor.clinet.fi (root@newzetor.clinet.fi [194.100.0.11]) by hauki.clinet.fi (8.7.3/8.6.4) with ESMTP id WAA27161; Sun, 18 Feb 1996 22:35:14 +0200 (EET) Received: (hsu@localhost) by newzetor.clinet.fi (8.7.3/8.6.4) id WAA00993; Sun, 18 Feb 1996 22:35:03 +0200 (EET) Date: Sun, 18 Feb 1996 22:35:03 +0200 (EET) Message-Id: <199602182035.WAA00993@newzetor.clinet.fi> From: Heikki Suonsivu To: Arjan.deVet@adv.IAEhv.nl (Arjan de Vet) Cc: freebsd-hackers@freebsd.org In-reply-to: Arjan.deVet@adv.IAEhv.nl's message of 18 Feb 1996 00:22:03 +0200 Subject: Re: Web server locks up... but not quite. (?) Organization: Clinet Ltd, Espoo, Finland References: <199602172208.XAA07502@adv.IAEhv.nl> Sender: owner-hackers@freebsd.org Precedence: bulk In article <199602172208.XAA07502@adv.IAEhv.nl> Arjan.deVet@adv.IAEhv.nl (Arjan de Vet) writes: In article you write: > This sort of thing has happened before with other 2.1.0-R machines >here, but tonight was the first time I was able to get to the console >of one before someone else rebooted it. > You could telnet to various ports on it (indicating that inetd was >still bound to those ports), but none of the services normally >attached to those ports would run, including internal ones like >chargen or daytime (indicating that inetd was blocked in some way). >It wasn't fielding RPC requests either. The login prompt was still >displayed on all the virtual consoles (I was still able to switch >between them), but there was no response from the keyboard, as if the >getty's had died off. The only sign of life was that it was returning >pings from another machine. [...] This has been happening with -current all the time. It also happened when we tried to run -stable at the time 2.1R was released. > Since there is no indication as to the source of the hang, is >there anything I can run periodically from cron to help track down the >problem? I can start tracking load averages, swap space usage, the >output of vmstat, netstat, iostat and nfsstat if that will help. Any >suggestions? We have seen exactly these symptoms too. At one moment our main ISP machine (2.0.5) hung almost every night between 2:06h and 2:07h. We had all kinds of programs running from cron like the ones you suggest but we could not find anything strange. But because it always happened around 2:07h when /etc/daily was running we moved /etc/daily from 02:00h to 10:00h (when there's always somebody near the machine to reboot it) and the nightly hangs disappeared. They happen once in while now, around 10:07 :-((. But the real problem has not been found yet... We get these at random intervals, though the more load the more often it happens. Frequency for the news/user server is once per day, for user only server about twice a week, and less pounded servers once per a couple of weeks. Dedicated servers don't seem to have much trouble. Routers don't seem to crash at all. I don't think this is hardware, it happens on all machines, and we have quite random collection of hardware. Mostly ASUS with random inserts of MSI or Intel motherboards and lots of various 386/486 motherboards. SCSI systems seem to be more prone to deadlocking but this does not necessarily mean anything as all loaded machines are generally with SCSI. I guess I would need to hire a full-time FreeBSD hacker to be able to run FreeBSD in this scale :-( -- Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND, hsu@clinet.fi mobile +358-40-5519679 work +358-0-4375360 fax -4555276 home -8031121