From owner-freebsd-hackers Thu Feb 1 11:21:11 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id LAA21077 for hackers-outgoing; Thu, 1 Feb 1996 11:21:11 -0800 (PST) Received: from bluewhale.emergent.com (bluewhale.emergent.com [140.174.2.161]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id LAA21068 for ; Thu, 1 Feb 1996 11:21:05 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by bluewhale.emergent.com (8.6.11/8.6.12) with SMTP id LAA23294 for ; Thu, 1 Feb 1996 11:21:04 -0800 Message-Id: <199602011921.LAA23294@bluewhale.emergent.com> X-Authentication-Warning: bluewhale.emergent.com: Host localhost didn't use HELO protocol To: freebsd-hackers@freebsd.org Subject: Re: Watchdog timers Date: Thu, 01 Feb 1996 11:21:04 -0800 From: Curt Mayer Sender: owner-hackers@freebsd.org Precedence: bulk hey, guys. here's a solution that smells much more like unix. have a daemon running on each node that is prone to hangup. this process wakes up every once in a while and does a system checkup. (stats things, pings places, looks at kernel statistics). when it see that things are ok, it sends a datagram to a particular machine, this node, the monitor, has a table in memory of all recent datagrams from each node. when a node hasn't been heard from for a while, it tells a BSR x10 controller to cycle power on the hung node. DUH. our ISP, tlg.net used to do routing and slip with sx-16's running NOS. whenever a hang happened, tlg used to do a power cycle with X10's. curt