Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Aug 2002 14:42:04 -0700 (PDT)
From:      Anshuman Kanwar <akanwar@engineering.ucsb.edu>
To:        <bill@carracing.com>, <friar_josh@webwarrior.net>
Cc:        <freebsd-questions@FreeBSD.ORG>
Subject:   RE: network link failover
Message-ID:  <Pine.LNX.4.33.0208291436330.30604-100000@ecipc056.engr.ucsb.edu>

next in thread | raw e-mail | index | archive | help

Hmm, here is a round wheel, if not exactly a *polished* round one ;)

We needed failover functionality to gaurd against switch failure
on our servers. Unable to find anything for freebsd I wrote a perl daemon
(appended below). It is quite specific (and exhaustive) to our setup and
might require some hacking to work with yours. I've replaced all IPs with
xxx's.

You may want to look at the do_switch function for the meat of the
failover.

Please feel free to ask me about the details.
Ansh Kanwar.


---------------begin


#!/usr/bin/perl
# Version     : 0.7
# Release Date: Aug 23 2002
# Author      : Anshuman Kanwar

# ver 0.7 modifications
#  - works with FreeBSD 4.6
#  - do_switch() is cleaner with much of 4.4
#    specific code commented out by uname
#  - intelligent route deletes
#    (if its not there, dont try to delete it)
#  - The way to speify alias is changed to
#     ifconfig_fxp0_alias0="inet xxx.xxx.xxx.xxx/16"
#     4.6 does not support the "alias" at the end anymore
#

# ver 0.6 additions
#  - now works with vlan aliased configs
#  - also with plain aliases


#-Summary---------------------------------------------
#
# Detects network interface failure and migrates
# connectivity to the standby interface.
#
#
# Functional Overview:
# ---------------------
# 1. Figure out what is the active interface
# 2. Ping a known host in order to detect packet loss
# 3. If we detect packet loss, switch to other interface
#
# Important properties:
#   - This script uses very little CPU when there are no
#     networking problems. This is important since it runs
#     together with important server processes that may need
#     100% CPU of the machine.
#   - During an outage,tThe CPU usage is a little higher,
#     but still low. We have measured a 1.6% peak on a
#     850Mhz Dell 350 server.
#   - Typically the script will detect an iterface outage
#     in less than 10 seconds.
#   - Eventually the script will give up and exit if after
#     $MAX_FLAPS interface switches we do not restore connectivity.
#   - When this script exits abnormally it tries to restore the
#     network interfaces to a "safe" state
#
# WARNING: Be very careful when changing the parameters
#          defined below. You should make sure the resulting
#          script behavior conforms to the properties above.
#
#-----------------------------------------------------

use strict;

#-----------define config variables here-------

my $DEBUG          = 1;              # debug(1) or quiet (0)
my $DOIT           = 1;              # do a test run (0) real thing (1)
my $IS_DAEMON      = 0;              # run as a daemon (1) foreground
process (0)
my $DO_MINUS_ALIAS = 1;              # required if you are running aliases
(NOT VLAN aliases)

# keep the MINUS VLAN = 0, will eventually be dropped
my $DO_MINUS_VLAN  = 0;


#
# OS and machine specific parameters
#
my $FILE_RC_CONF  = "/etc/rc.conf"; # Where are the configs. OS specific
my $INT_TYPE      = "fxp";          # Type of interface. Machine specific
my $INTF_0        = "fxp0";         # Interface zero. Machine specific
my $INTF_1        = "fxp1";         # Interface one. Machine specific


#---------------------------------------------
# Data-center specific patameters.
#
#
# Headquarters values
#

#############
my $LOCATION_HQ =  "HQ";
my $LOCATION_SNV = "SNV";

my %HASH_DEF_ROUTE =
  (
   $LOCATION_HQ   => "10.4.xx.xx",
   $LOCATION_SNV  => "xx.xxx.xxx.xxx",
  );

my %HASH_HOST1_TO_PING =
  (
   $LOCATION_HQ   => "10.4.xx.xxx",
   $LOCATION_SNV  => "xxx.xx.xxx.xx",
  );

my %HASH_HOST2_TO_PING =
  (
   $LOCATION_HQ   => "xx.xxx.xx.xx",
   $LOCATION_SNV  => "xx.xxx.xxx.x",
  );
###################


my $OUR_LOCATION = $LOCATION_HQ;

my $DEF_ROUTE     = $HASH_DEF_ROUTE{$OUR_LOCATION};     # Default route if
no explicit entry in conf
my $HOST_TO_PING  = $HASH_HOST1_TO_PING{$OUR_LOCATION};  # Router's IP (to
ping)
my $HOST2_TO_PING = $HASH_HOST2_TO_PING{$OUR_LOCATION};  # backup router

unless (defined $DEF_ROUTE)    { log_die("Define default route\n"); }
unless (defined $HOST_TO_PING) { log_die("Define Host to ping\n");  }
unless (defined $HOST2_TO_PING){ log_die("Define Host2 to ping\n"); }

#
#---------------------------------------------

#
# Algorithm parameters.
#
my $MAX_FLAPS     = 2000;    # Limit before give up
my $WAIT_N_LOSSES = 3;       # ON ping loss: retry limit file.


my $HOSTNAME = `hostname`;
chomp($HOSTNAME);


my $OS_VER=`uname -r`;
chomp($OS_VER);



#---                                        --#
#You should not be required to edit below this#
#---                                        --#


#sub-------------die_log----------------------#
#                                             #
# Log critical error and die                  #
#---------------------------------------------#
sub log_die {
    my ($message) = @_;

    system("logger -p local0.error $message");

    #
    # TODO: restore a safe state
    #

    die($message);
}


#sub-------------ping_host--------------------#
#                                             #
# Pings $HOST_TO_PING 1 time                  #
# Returns  number of pings received           #
#---------------------------------------------#
sub ping_host {
  my (
      $host      # host to be pinged
     ) = @_;

  my $pingval = undef; # Number of successful pings

  # Call ping and capture its output
  my @pingin = ();
  my $pingLine = `ping -c 1 -q -t 2 $host`;
  #
  # TODO: handle pingLine == undef. If needed
  #
  @pingin = split(/\n/, $pingLine);


  # Parse Ping output to find out packet loss
  foreach $pingLine (@pingin){
      chomp($pingLine);
      $pingLine =~ s/\s+$//;
      if($pingLine =~ /^\d packets transmitted, (\d) packets received,/){
          $pingval = $1;
      }
  }

  unless(defined $pingval){log_die(" x DYING:Unexpected ping output\n");}
  return $pingval;
} # ping_host


#sub------------parse_rc_file---------------------#
#                                                 #
# Parse /etc/rc.conf and return the commands that #
# need to be issued to failover.                  #
# All vlan alias and inet commands are included   #
#                                                 #
# It extracts commands from teh config file and   #
# replaces all occurrances of $old_intf with      #
# $new_int                                        #
#                                                 #
#-------------------------------------------------#
sub parse_rc_file {
    my (
        $conf_file, # file to parse
        $old_intf,  # interface failed
        $new_intf,  # new interface
        $int_type   # type of int.
       ) = @_;

    # Accumulate result in those two arrays
    my @if_cmds_in = ();        # buffer for rc.conf ouput
    my @vlan_cmds_in = ();      # buffer for vlan part of rc.conf

    open (PARSERC, $conf_file) || log_die (" x DIED:no rc file");
    while (my $line = <PARSERC>) {
        next if($line =~ /^\s*$/);      # Delete blank lines
        next if($line =~ /^\#.*/);
        next if ($line =~ /^ifconfig_vlan\d_alias\d/);
        next if ($line =~ /^ifconfig.*alias/);

        if($line =~ /^ifconfig_$int_type/) {
            $line =~ s/(_|=|\")/ /g;      # strip punctuation
            $line =~ s/$old_intf/$new_intf/g; # replace interface
            push(@if_cmds_in, $line);     # all ifconfig !vlan
        }

        if($line =~ /^ifconfig_vlan/) {
            $line =~ s/(_|=|\")/ /g;      #strip punctuation
            $line =~ s/$old_intf/$new_intf/g; #replace inteface
            push(@vlan_cmds_in, $line);
        }
    } # while
    close(PARSERC);

    push (@if_cmds_in, @vlan_cmds_in);
    return @if_cmds_in;
} # parse_rc_file


#sub------------vlan_rc_file--------------------
#
#
#
#
#------------------------------------------------
sub vlan_rc_file {
    my (
        $conf_file, # file to parse
       ) = @_;

    # Accumulate result in those two arrays
    my @alias_list = ();        # buffer for rc.conf ouput

    open (PARSERC, $conf_file) || log_die (" x DIED:no rc file");
    while (my $line = <PARSERC>) {
        next if($line =~ /^\s*$/);      # Delete blank lines
        next if($line =~ /^\#.*/);

       if ($line =~ /ifconfig_vlan\d_alias\d/){
        if($line =~ /inet\s+(\d+.\d+.\d+.\d+).*vlan/) {
            my $ipadd=$1;
            push(@alias_list, $ipadd);     # all ifconfig !vlan
        }
       }
    } # while
    close(PARSERC);
    return @alias_list;
} # vlan_rc_file

#sub------------alias_rc_file--------------------
#
#
#
#
#------------------------------------------------
sub alias_rc_file {
    my (
        $conf_file, # file to parse
       ) = @_;

    # Accumulate result in those two arrays
    my @alias_list = ();        # buffer for rc.conf ouput

    open (PARSERC, $conf_file) || log_die (" x DIED:no rc file");
    while (my $line = <PARSERC>) {
        next if($line =~ /^\s*$/);      # Delete blank lines
        next if($line =~ /^\#.*/);
        next if ($line =~ /ifconfig_vlan/);


if($line =~ /^ifconfig.*alias.*\s([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+.[0-9]+)/)
{
            my $ipadd=$1;
            push(@alias_list, $ipadd);     # all ifconfig !vlan
        }
    } # while
    close(PARSERC);
    return @alias_list;
} # alias_rc_file

#sub------------parse_ifconf--------------------
#
# execute ifconfig -u to list UP interfaces
# parse output to determine the number of match
# -ing interfaces and their status
#
# Returns:
#   $sys_int_total_up - number of "up" interfaces
#   $intf             - name of last "up" interface
#   $stat             - status of $intf
#-----------------------------------------------
sub parse_ifconf {
    my (
        $int_type # Interface type
       ) = @_;

    my $intf;             # return interface
    my $stat;             # return status
    my $sys_int_total_up; # total interfaces up

    # Get ifconfig information
    #--------------------------
    my @ifconin = ();
    my $if_in_line = `ifconfig -u`;
    @ifconin = split(/\n/, $if_in_line);

    # Extract info from ifconf output
    #--------------------------------
    foreach my $ifconLine (@ifconin) {
        chomp($ifconLine);
        $ifconLine =~ s/\s+$//;     # Delete trailing blanks

        if($ifconLine =~ /^$int_type[0-9]:/){
            my @fields  = split(/:\s/, $ifconLine);
            $intf = $fields[0];
            $sys_int_total_up++;     # number of up interfaces
        }

        if($ifconLine =~ /\s+status:\s/){
            my @fields  = split(/:\s/, $ifconLine);
            $stat = $fields[1];   # Status information
        }
    } # for each

    return ($sys_int_total_up, $intf, $stat)
} # parse_ifconf



#sub----------------do_switch---------------------  #
#                                                   #
# Switched from $old_itntf to alternative interface #
#                                                   #
# Has these distinct parts                          #
#    1. parse $conf_file, typically /etc/rc.conf    #
#    2. bring failed interface down                 #
#    3. bring new interface up                      #
#                                                   #
# Returns:                                          #
#   - no return value                               #
#-------------------------------------------------  #
sub do_switch {
    my (
        $old_intf,  # Interface
        $conf_file, # file to parse
        $int_type   # interface type
       ) = @_;

    my $ret;              # temp variable for return values of system()
calls

    # Based on present interface decide new interface.
    my $new_intf = undef;
    if ($old_intf eq $INTF_0) {
        $new_intf = $INTF_1;
    } elsif ($old_intf eq $INTF_1) {
        $new_intf = $INTF_0;
    } else {
        log_die(" x DYING:Interface is invalid\n");
    }

    # Always log attempts to switch
    # These will be picked up by logsurfer
    system("logger -p local0.info SWITCH ATTEMPTED: [$old_intf] to
[$new_intf]");

    # Get ifconfig commands from config file
    my @if_cmds_in = parse_rc_file($conf_file, $old_intf, $new_intf,
$int_type);
    if ($DEBUG) { print(@if_cmds_in); }


    if ($DOIT){

        if ( $OS_VER =~ /4\.4/){
            # HACK  because FreeBSD 4.4 does not understand -ifp
            # option for route. Not needed for 4.5 and up

            if ($DO_MINUS_ALIAS) {
                system("ifconfig $new_intf 192.168.0.200 -alias");
                my (@ret)= alias_rc_file("/etc/rc.conf");
                foreach my $ip(@ret){
                # system ("ifconfig $old_intf $ip -alias");
                # print $ip." deleted as alias\n";
                }
            }

            $ret = system("ifconfig $old_intf inet 192.168.0.200");
            if($ret != 0) { log_die (" x DYING:hack failed\n"); }
        }


        if ($DO_MINUS_VLAN) {
            system("ifconfig vlan0 -vlandev $old_intf");
        }

        # Special parsing is required for alias IPs
        # delete them from the old interface
        my (@ret)= alias_rc_file("/etc/rc.conf");
        foreach my $ip(@ret){
        print $ip."----------\n";
          system ("ifconfig $old_intf inet $ip -alias");
        }

        # Bring failed interface down
        $ret = system("ifconfig $old_intf down");
        if ($ret != 0) { log_die(" x DYING:ifdown failed\n"); }

        # Clear ARP cache
        $ret = system("arp -a -d");
        if ($ret != 0) { log_die (" x DYING:arp failed\n"); }

        # to gaurd against border cases
        # check if default route is in routing table
        # only then delete it
        my $netstat_out=`netstat -rn`;
        if ( $netstat_out =~ "default ")
        {
            # Delete old route
            # not required in 4.6 as it intelligently deletes the route
            # when the interface is downed
            $ret = system("route delete default");
            if ($ret != 0) { log_die(" x DYING:Route Delete failed\n");}
        }

        # Bring new interface up
        $ret = system("ifconfig $new_intf up");
        if($ret != 0) { log_die (" x DYING:ifup failed\n"); }

        # Fail Over to Other Interface including VLANs
        foreach my $execline (@if_cmds_in) {
            $ret = system($execline);
            if ($ret != 0) { log_die (" x DYING:ifconfig failed\n"); }
        }

        if ($DO_MINUS_VLAN) {
            my (@ret_vlan)= vlan_rc_file("/etc/rc.conf");
            foreach my $vlan_ip(@ret_vlan){
            system ("ifconfig vlan0 inet $vlan_ip/32 alias");
               print $vlan_ip."\n";
            }
        }

        my (@ret)= alias_rc_file("/etc/rc.conf");
        foreach my $ip(@ret){
        print $ip."----------\n";
          system ("ifconfig $new_intf inet $ip alias");
        }

        # Add new route
        $ret = system("route add -ifp $new_intf default $DEF_ROUTE");
        if($ret != 0) { log_die(" x Route Add Failed\n");}

    }
    return $new_intf;
} # do_switch

#-[daemonize]----------------------------------------------------
#
# Makes this process independent of any terminal
# Runs as a daemon in the background
#----------------------------------------------------------------

sub daemonize {
    #chdir '/'                 or die "Can't chdir to /: $!";
    open STDIN, '/dev/null'    or die "Can't read /dev/null: $!";
    open STDOUT, '>>/dev/null' or die "Can't write to /dev/null: $!";
    open STDERR, '>>/dev/null' or die "Can't write to /dev/null: $!";
    defined(my $pid = fork)    or die "Can't fork: $!";
    exit if $pid;
    umask 0;
}


#-[-MAIN-]---------------------
{
    my $SLEEP_TIME_NORMAL     = 3;  # 3 seconds between probes when
nothing is wrong
    my $SLEEP_TIME_AGGRESSIVE = 1;  # 1 second between probes when we
detect an outage


    my $nFlipFlops = 0;                   # Number of consequtive
flip-flops between interfaces
    my $total_iter = 0;                   # total iterations so far
    my $sleeptime  = $SLEEP_TIME_NORMAL;  # time between iterations
    my $nPingLoss  = 0;                   # Number of back-to-back ping
tests that had packet loss

    daemonize if ($IS_DAEMON);

    system (" echo $$ > /tmp/FAIL_PIDFILE\n");

    # Check if the config file exists, if not, then die
    die(" x Cannot start, config file missing\n")
      unless (-e $FILE_RC_CONF);


    #----begin main loop---------#
    MAIN: while (1) {
        # Maintain Counters
        $nFlipFlops++;
        if ($DEBUG) { $total_iter++;}

        if ($nFlipFlops > $MAX_FLAPS) {
            system("logger -p local0.info MAX FLAPS met: QUITTING");
            log_die(" x DYING:Max Flaps met\n");
        }

        if ($DEBUG) {
            print("[$HOSTNAME\@$OS_VER]___[Stuckiter:", $nFlipFlops -
1,"]___[Iter:", $total_iter, "]\n");
        }

        # Step 1: get ifconfig status
        my (
            $sys_int_total_up,            # number of "up" interfaces
            $cur_intf,                    # last "up" interface
            $stat_cur_intf                # state of $cur_intf
           ) =  parse_ifconf($INT_TYPE);

        if ($sys_int_total_up gt 1) {
            if ($DEBUG) { print(" x MORE than ONE interfaces are UP:
Waiting\n");}
            sleep ($SLEEP_TIME_NORMAL * 2);
            next MAIN;
        } elsif (!defined($sys_int_total_up)){
            if ($DEBUG) { print (" x No Interface is up: Waiting\n");}
            sleep ($SLEEP_TIME_NORMAL * 2);
            next MAIN;
        }

        if ($DEBUG) {
            print ("+ EXACTLY One interface is up\n");
        }

        # Step 2: ping the  $HOST_TO_PING

        my $pingval1 = ping_host($HOST_TO_PING);
        if ($DEBUG) {
            print ("+ PING  returned ", $pingval1, "\n");
        }

        my  $pingval2 = ping_host($HOST2_TO_PING);
        if ($DEBUG) {
            print ("+ PING2 returned ", $pingval2, "\n");
        }

        # Discard the lesser value of number of pings answered
        # assuming that the host/router could have gone down.
        my $pingval= ($pingval1 < $pingval2) ? $pingval2 : $pingval1;

        if ($pingval < 1){
            # Packet loss! Make a log entry
            if ($DEBUG) {  print("  - Ping Loss\n"); }
            system("logger -p local0.info PING LOSS");
        } else {
            # No packet loss. Clear error counters
            if ($DEBUG) { print("  | No ping loss\n"); }
            $nFlipFlops  =  0;
            $nPingLoss   =  0;
            $sleeptime   = $SLEEP_TIME_NORMAL;
        }

        if (($stat_cur_intf eq "active") && ($pingval < 1)) {
            # The interface status is "active" and we have packet loss

            # Get aggressive (reduce sleeptime).
            $sleeptime = $SLEEP_TIME_AGGRESSIVE;
            $nPingLoss++;
            if ($DEBUG) { print("  - aggressive\n"); }

            # If we do not recover within $WAIT_N_LOSSES iterations, then
            # we call do_switch() to swicth to a different interface
            if ($nPingLoss > $WAIT_N_LOSSES ) {
                do_switch($cur_intf, $FILE_RC_CONF, $INT_TYPE);
                if ($DEBUG) { print("  - SWITCHING PINGLOSS from
$cur_intf\n"); }
            }

        } elsif ($stat_cur_intf ne "active"){
            # The current interface is not active.
            # Maybe cable broken or switch died
            #
            # Future feature: to find out the nature of the failure
            # and act accordingly

            if ($DEBUG) { print("  - LINK LOSS: SWITCHING IMMIDIATELY from
$cur_intf\n"); }
            system("logger -p local0.info LINK LOSS: SWITCHING IMMIDIATELY
from $cur_intf");


            # Switch to a different interface in the hope that it works
            do_switch($cur_intf, $FILE_RC_CONF, $INT_TYPE);

            # We don't know if the new interface will work.
            # So, we stay aggressive until we know for sure
            $sleeptime = $SLEEP_TIME_AGGRESSIVE;
        }

        if ($DEBUG) { print("  | sleep for $sleeptime sec\n\n"); }

        # Hardcoded guard against thrashing. Never sleep for less than one
seconds
        if ($sleeptime < 1) {
            sleep(1);
        } else {
            sleep($sleeptime);
        }

    } # while (1)
}




#---------------end



-----Original Message-----
From: Josh Paetzel [mailto:friar_josh@webwarrior.net]
Sent: Tuesday, August 20, 2002 12:49 PM
To: W. Desjardins
Cc: freebsd-questions@FreeBSD.ORG
Subject: Re: network link failover


On Tue, 2002-08-20 at 18:56, W. Desjardins wrote:
> Hi,
>
> I couldnt find anything in the archives specific to this question.
>
> I was wondering if there are any network card drivers that support link
> failover between either 2 cards, or 2 ports on the same card? basically,
I
> am looking to have a server hooked to 2 switches and have the link
> failover (while maintaining IP address) failover to the new port in the
> event of a dead switch.
>
> I currently use this functionality in solaris with a daemon called
> in.mpathd that uses interface aliases as floating ip's between network
> interfaces. Solaris will failover and back, any links that fail for any
> reason. Its nice in that besides load balancing to all interfaces in a
> group, I can use any number of interfaces on any card as a group.
>
> I know some of the multiport ethernet cards such as the intel dual-port
> and dlink quad-port claim failover capabilities, but I suspect that is
> only for windows. is this correct? Is there any ability in the drivers
fro
> these cards to accomodate failover?
>
> Oh...and I know I can write a script in an hour or so to perform this
duty
> and unless I find any new info here, is what I will be doing. I just
dont
> care to reinvent a sub-standard wheel if a nice round one already exists
> ;)
>
> Thanks,
>
> Bill

Nope.  No wheels here to reinvent.  FreeBSD is basically designed to be
a one way shot to the internet routing platform.  I don't know for a
fact that the intel multiport cards are capable of what you want, but in
6 years of using FreeBSD I've never heard of one being used in that
capacity.  (Take it for what it's worth)

Josh



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message

---


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.33.0208291436330.30604-100000>