Date: Wed, 29 Apr 1998 11:15:46 +0800 From: Greg Lehey <grog@lemis.com> To: Hans Huebner <hans@artcom.de>, freebsd-hackers@FreeBSD.ORG Subject: Re: FreeBSD HA configuration / Ethernet address takeover Message-ID: <19980429111546.54200@papillon.lemis.com> In-Reply-To: <Pine.BSF.3.96.980425145610.11665A-100000@transrapid.artcom.de>; from Hans Huebner on Sat, Apr 25, 1998 at 03:11:21PM %2B0200 References: <Pine.BSF.3.96.980425145610.11665A-100000@transrapid.artcom.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 25 April 1998 at 15:11:21 +0200, Hans Huebner wrote: > Hello there, > > we're running some of our critial LAN services (NIS, DNS, mail etc.) on > FreeBSD. The systems are quite stable, but from time to time we need to > take a system down for maintenance purposes. Also, hardware problems can > cause unplanned down times. > > I'm currently looking for a solution to configure a PC as a warm-standby > fallback server for the most important services (NIS and DNS). To make a > failover to the fallback server as transparent to the users as possible, > it would be best if the fallback system could take over the ethernet > address of the failed server. I've seen this work with certain > (expensive) Solaris configurations, and I'd like to do something similar > with FreeBSD. > > I tried to implement DNS failover by moving our name service IP address to > another machine, but this resulted in severe client problems (most clients > fail to renegotiate the MAC adress with ARP within finite time). > > Looking at the ifconfig manpage, I could not find a general way to set a > Ethernet card's MAC address. Is there a documented solution to this > problem? If not, would adding such functionality be problematic? > > Any pointers, hints or suggestions are greatly appreciated. I'd also be > interested in any reports on running two FreeBSD systems on one shared > SCSI bus. I suppose the disk driver would need to be changed quite a bit > to make use of the RESERVE UNIT SCSI command to prevent access collisions. Sorry for my late entry in this discussion--I'm currently connected to the net only once every day or two, though this will change back to normal by the weekend. Tandem Computers ("a Compaq company") has been addressing these problems for years. I know that a number of points have been discussed already, but it might be interesting to consider how Tandem does it, and how a PC solution could approximate. A big difference between the environment you're considering and the Tandem environment is that the Tandem environments are logically a single system which doesn't fail, whereas you're looking at separate systems, one of which may fail. A significant problem is to determine when the primary machine fails (what, you don't get a reply from the machine? Maybe *your* Ethernet board has failed). This problem has caused Tandem headaches for decades, and I'm not going to discuss it in this message. 1. Reliable Ethernet Tandem's Reliable Ethernet product does pretty much what you suggest: it has one board waiting as a hot standby, and if the first fails, the second will take its MAC address and carry on as if nothing had happened. The main concern is determining when the first board has failed. If you have a board which can change its MAC address, this obviously makes sense. It's an omission in FreeBSD not to have the facility. The fact that some boards can't do it is no argument against the function: if the board can't do it, the ioctl should return an appropriate error indication. In the case of a board which can't change its MAC address, the alternative of assuming its IP address and sending a couple of pings to the broadcast address sounds like a good workaround. Certainly it will normalize things faster than waiting for the application layer to try an alternative IP. 2. SCSI takeover. Tandem has had a number of strategies. None use two host adaptors on a string. The one used by the (now defunct) S2 range of triple modular redundant machines is closest to what you suggest: it uses a dual ported host adaptor, but only one IO processor controls the host adaptor at any one time. Since the system as a whole doesn't fail, there's no need to perform an fsck on the disks at takeover. I can't see a good solution in using two host adaptors on two different machines connected to a single string. As long as the second machine doesn't have access to the first machine's buffer cache, data can get lost, and a takeover must involve an fsck. The overhead of fsck could go into several minutes, much longer than the time that the application layer takes to try another IP address. I don't think that this would make much sense from an availability standpoint, though it obviously makes sense to recover the file systems and make them available on another machine if the first machine is going to be out of commission for any length of time. What makes more sense is to replicate the data across multiple systems. Possibly a software layer like the vinum volume manager would be able to perform this function: put one copy of the data on the local machine, another on one or two other machines via NFS or some other protocol, and always read from the local machine. As long as the write rate is not too high, this should allow for higher availability. Greg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980429111546.54200>