From owner-freebsd-isp Wed Jan 7 00:00:48 1998 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id AAA24317 for isp-outgoing; Wed, 7 Jan 1998 00:00:48 -0800 (PST) (envelope-from owner-freebsd-isp) Received: from implode.root.com (implode.root.com [198.145.90.17]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id AAA24311; Wed, 7 Jan 1998 00:00:42 -0800 (PST) (envelope-from root@implode.root.com) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.5/8.8.5) with ESMTP id AAA08854; Wed, 7 Jan 1998 00:00:05 -0800 (PST) Message-Id: <199801070800.AAA08854@implode.root.com> To: Dave Smith cc: freebsd-questions@freebsd.org, freebsd-isp@freebsd.org Subject: Re: Remote power cycle In-reply-to: Your message of "Tue, 06 Jan 1998 23:05:59 PST." From: David Greenman Reply-To: dg@root.com Date: Wed, 07 Jan 1998 00:00:05 -0800 Sender: owner-freebsd-isp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >together for me so I can set it up. I hope it works. I scoured the list >and it was either Digiboard (or whatever it is called, starts with Digi) >or Cyclades. I use a Cyclades ISA multiport here to connect to my various development machines, but the Digiboard should work nicely as well. >> a laptop. The remote console machine also has an internal modem that I hacked >> to function as a reset switch whenever it goes off hook to dial - I use this >> to reset wcarchive if necessary. I have a watchdog process running on the >> remote console which pings wcarchive every minute or so, and if the pings >> start failing, wcarchive is automatically reset (I have a kermit dialout >> script for this :-)). >> ... > >This I do not do, and I would like to know more about the watchdog process >and how you do the auto reset. And how did you hack an internal modem to >act as a reset switch? It was quite easy for me to do, but the level of difficulty will depend on your ability to use a soldering iron and knowledge of electronics. What I did was reconfigure the hook relay so that the switch simply shorts the tip/ring wires rather than connect them to the isolation transformer. It took about 10 minutes, most of which was trying to figure out where the traces ran on the multilayer circuit board. Configured this way, I just plug a regular RJ11 modular cable in as the 'phone line', with the other end of the cable hooked up to the reset contacts on wcarchive's motherboard (actually, it's wired in parallel with the reset switch so that it can be manually reset as well). >Pings would be difficult for me because of our configuration it looks like >machines are up all the time. We use a product called BigIP from >f5.com. It does fancy load-balancing like the Cisco LocalDirector. > >However some sort of ping to port 23 would work for me. Hmmm. Can you configure the thing to pass ICMP echo request/reply through to the real host? If not, then you'd need to write something to do a TCP or UDP test on a special port. In any case, my script on the remote console is: (Warning: I'm not a shell programmer! Looking at the following my cause permanent brain damage! :-)) .......the watchdog #!/bin/sh while true ; do if (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) && (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) && (! ping -s 8 -c 1 165.113.121.81 > /dev/null 2>&1) && (! ping -s 8 -c 15 165.113.121.81 > /dev/null 2>&1) && (ping -s 8 -c 5 165.113.121.82 > /dev/null 2>&1) && (! ping -s 8 -c 5 165.113.121.81 > /dev/null 2>&1) ; then ./resetwc-wait ; fi sleep 60 done .......resetwc-wait #!/bin/sh echo `date` Resetting wcarchive >> log kermit reset.kerm < /dev/null > /dev/null sleep 900 .......and the kermit script, reset.kerm set line /dev/cuaa2 set modem hayes dial 1 ....... ...this is a hack, of course. The idea is to attempt several single pings so that packet loss and short term (a few second) network problems don't result in a machine reset. The second to last ping command on the .82 address was added recently: it's the IP address of the Cisco on the other end of the fast ethernet cable. I was having a problem with wcarchive getting reset every time our ISP rebooted their router (rare, but it happened more than once in the last 2 years)...so now I make sure that the remote console gets a response from the router before assuming that wcarchive is down. Ideally, I'd have the remote console and wcarchive connect to a switch and not involve any other hardware; this is planned in the future. After resetting the machine, it waits 15 minutes before resuming the watchdog - this gives the machine plenty of time to do filesystem checks and come back online. Of course, I have to disable the watchdog whenever I want to reboot the machine for maintanence purposes. >> It's never been necessary to power cycle the machine, and considering how >> much hardware is involved, that's a good thing. :-) > >I know it is a bad thing. I could only think of a good swift power cycle >when machines are not responding. I understand. I've actually had a fair number of problems getting all the hardware up to speed after a power failure, so reset is definately prefered over power cycling if it can be arranged. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project