From owner-freebsd-isp  Wed Jan  7 00:00:48 1998
Return-Path: <owner-freebsd-isp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id AAA24317
          for isp-outgoing; Wed, 7 Jan 1998 00:00:48 -0800 (PST)
          (envelope-from owner-freebsd-isp)
Received: from implode.root.com (implode.root.com [198.145.90.17])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id AAA24311;
          Wed, 7 Jan 1998 00:00:42 -0800 (PST)
          (envelope-from root@implode.root.com)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.5/8.8.5) with ESMTP id AAA08854;
	Wed, 7 Jan 1998 00:00:05 -0800 (PST)
Message-Id: <199801070800.AAA08854@implode.root.com>
To: Dave Smith <dpsmith@xoom.com>
cc: freebsd-questions@freebsd.org, freebsd-isp@freebsd.org
Subject: Re: Remote power cycle 
In-reply-to: Your message of "Tue, 06 Jan 1998 23:05:59 PST."
             <Pine.BSF.3.96.980106224529.4373G-100000@mail1.xoom.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Wed, 07 Jan 1998 00:00:05 -0800
Sender: owner-freebsd-isp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

>together for me so I can set it up. I hope it works. I scoured the list
>and it was either Digiboard (or whatever it is called, starts with Digi)
>or Cyclades.

   I use a Cyclades ISA multiport here to connect to my various development
machines, but the Digiboard should work nicely as well.

>>   a laptop. The remote console machine also has an internal modem that I hacked
>>   to function as a reset switch whenever it goes off hook to dial - I use this
>>   to reset wcarchive if necessary. I have a watchdog process running on the
>>   remote console which pings wcarchive every minute or so, and if the pings
>>   start failing, wcarchive is automatically reset (I have a kermit dialout
>>   script for this :-)).
>> ...
>
>This I do not do, and I would like to know more about the watchdog process
>and how you do the auto reset. And how did you hack an internal modem to
>act as a reset switch?

   It was quite easy for me to do, but the level of difficulty will depend
on your ability to use a soldering iron and knowledge of electronics. What
I did was reconfigure the hook relay so that the switch simply shorts the
tip/ring wires rather than connect them to the isolation transformer. It
took about 10 minutes, most of which was trying to figure out where the
traces ran on the multilayer circuit board. Configured this way, I just plug
a regular RJ11 modular cable in as the 'phone line', with the other end of
the cable hooked up to the reset contacts on wcarchive's motherboard
(actually, it's wired in parallel with the reset switch so that it can
be manually reset as well).

>Pings would be difficult for me because of our configuration it looks like
>machines are up all the time. We use a product called BigIP from
>f5.com. It does fancy load-balancing like the Cisco LocalDirector.
>
>However some sort of ping to port 23 would work for me.

   Hmmm. Can you configure the thing to pass ICMP echo request/reply through
to the real host? If not, then you'd need to write something to do a TCP or
UDP test on a special port. In any case, my script on the remote console is:

(Warning: I'm not a shell programmer! Looking at the following my cause
permanent brain damage! :-))

.......the watchdog
#!/bin/sh

while true ; do
        if (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) &&
        (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) &&
        (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) &&
        (! ping -s 8 -c 15 165.113.121.81 > /dev/null 2>&1) &&
        (ping -s 8 -c 5 165.113.121.82 > /dev/null 2>&1) &&
        (! ping -s 8 -c 5 165.113.121.81 > /dev/null 2>&1) ; then ./resetwc-wait ; fi
        sleep 60
done
.......resetwc-wait
#!/bin/sh
echo `date` Resetting wcarchive >> log
kermit reset.kerm < /dev/null > /dev/null
sleep 900
.......and the kermit script, reset.kerm
set line /dev/cuaa2
set modem hayes
dial 1
.......

   ...this is a hack, of course. The idea is to attempt several single pings
so that packet loss and short term (a few second) network problems don't
result in a machine reset. The second to last ping command on the .82 address
was added recently: it's the IP address of the Cisco on the other end of the
fast ethernet cable. I was having a problem with wcarchive getting reset
every time our ISP rebooted their router (rare, but it happened more than
once in the last 2 years)...so now I make sure that the remote console gets
a response from the router before assuming that wcarchive is down. Ideally,
I'd have the remote console and wcarchive connect to a switch and not involve
any other hardware; this is planned in the future. After resetting the
machine, it waits 15 minutes before resuming the watchdog - this gives the
machine plenty of time to do filesystem checks and come back online. Of
course, I have to disable the watchdog whenever I want to reboot the machine
for maintanence purposes.

>>    It's never been necessary to power cycle the machine, and considering how
>> much hardware is involved, that's a good thing. :-)
>
>I know it is a bad thing. I could only think of a good swift power cycle
>when machines are not responding.

   I understand. I've actually had a fair number of problems getting all
the hardware up to speed after a power failure, so reset is definately
prefered over power cycling if it can be arranged.

-DG

David Greenman
Core-team/Principal Architect, The FreeBSD Project