From owner-freebsd-questions@FreeBSD.ORG Thu Feb 16 02:03:20 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B39E16A420 for ; Thu, 16 Feb 2006 02:03:20 +0000 (GMT) (envelope-from lavalamp@spiritual-machines.org) Received: from mail.digitalfreaks.org (arbitor.digitalfreaks.org [216.151.95.158]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B2BB43D46 for ; Thu, 16 Feb 2006 02:03:20 +0000 (GMT) (envelope-from lavalamp@spiritual-machines.org) Received: by mail.digitalfreaks.org (Postfix, from userid 1022) id 8AF7E1768E; Wed, 15 Feb 2006 21:03:26 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.digitalfreaks.org (Postfix) with ESMTP id 8997017128 for ; Wed, 15 Feb 2006 21:03:26 -0500 (EST) Date: Wed, 15 Feb 2006 21:03:26 -0500 (EST) From: "Brian A. Seklecki" X-X-Sender: lavalamp@arbitor.digitalfreaks.org To: freebsd-questions@freebsd.org Message-ID: <20060215210248.P47621@arbitor.digitalfreaks.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: ng_one2many v.s. AFT (NIC Fault Tolerance/Fail Over/Redundancy Revisited) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2006 02:03:20 -0000 FYI, to bring this thread back to the list ---------- Forwarded message ---------- Date: Wed, 15 Feb 2006 20:53:59 -0500 (EST) From: Brian A. Seklecki To: Jonathan Donaldson , glebius@freebsd.org, glebius@cell.sick.ru Cc: jks@clickcom.com, Brian J. Creasy , Chad Ziccardi , Danny Howard , Brad Bendy Subject: Re: ng_one2many v.s. AFT (NIC Fault Tolerance/Fail Over/Redundancy Revisited) (fwd) On Wed, 15 Feb 2006, Jonathan Donaldson wrote: > Take a look here: > > http://www.freebsd.org/cgi/getmsg.cgi?fetch=607312+0+/usr/local/www/db/text/2004/cvs-all/20041128.cvs-all > Yea, I see it now. Sorry. I'm CC'ing the developer who commited the changes, and the the MFC. The man page needs to be updated, and it should mention your caveat. I got caught by your caveat with the one-link-down-at-boot. However, the code begins to work after bringing up the down link, as if it would if they were both active at boot, which is good. Where I got tripped up was that I thought that quote: "The node listens to flow control message from many hooks, and considers link failed if NGM_LINK_IS_DOWN is received.", Where "Flow Control Messages" I interrpted that as something on the wire like a STP/802.1q BPDU. Apparently, it's really an In-Kernel event related to the new ethernet link-state code in 6.x, or maybe just glorrified poll()'ing. Either way, it works well. Sorry for jumping the gun. ~lava P.S., in 7.0-CURRENT, there appears to be an import of the OpenBSD bridge(4) to relate the old-school "options BRIDGE" code. This one being 802.1q STP aware. When 7.x becomes release production, I suspect I'll end up using that instead since it works so well with NetBSD/OpenBSD for HA ethernet, plus I'd rather have a PVST+ Cisco switch make the packet forwarding the decisions >:} ~lava > and then look here: > > http://fxr.watson.org/fxr/source/netgraph/ng_one2many.h?v=RELENG6 > > > 65 /* Algorithms for detecting link failure (XXX only one so far) */ > 66 #define NG_ONE2MANY_FAIL_MANUAL 1 /* use enabledLinks[] > array */ > 67 #define NG_ONE2MANY_FAIL_NOTIFY 2 /* listen to flow control > msgs */ > > > so set your fail alg to 2 and see if you see the messages and failover... > > > > On Feb 15, 2006, at 8:11 PM, Brian A. Seklecki wrote: > >> On Thu, 12 Jan 2006, Brian J. Creasy wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Brian A. Seklecki wrote: >>> | >>> | Johnathan's comments suggest that we may need to move to 6.x on the >>> | production cluster. >>> | >>> | 6.x has been upgraded from a technology release to stable, and our goal >>> | is stability. >>> | >>> | Brian: What are you thoughts so far on the 6.x experience? >>> >>> no complaints here.. though, i have it running only on my laptop and >> >> ....Okay. >> >> | As of Freebsd 6_0 (which is at RC1 now), the NG_ONE2MANY does >> | support the failure of a link which does not end up with 50% packet >> | loss. There is new code in the One2Many module that xmits a layer 2 "I'm >> | alive" broadcast out all links, as long as this is picked up on the >> | other links, then all interfaces are considered alive. If one of the >> | packets is not received, then after 2 x heartbeat duration that link is >> | considered "down". I have tested this in the 6.0 code and it works with >> | one caveat. When the server is brought up, both interfaces must be >> | connected and live, or for some reason, the failure algorithm never >> | seems to kick in. I saw exactly what you saw in 5.4 and newer with >> | regards to the 50% packet loss. >> >> Jonathan: >> >> I'm not sure where you got the info about this. Accoring to the >> NG_ONE2MANY(4) page in CVS -rHEAD (-CURRENT): >> >> "Currently, the valid settings for the xmitAlg field are >> NG_ONE2MANY_XMIT_ROUNDROBIN (default) or NG_ONE2MANY_XMIT_ALL. The only >> valid setting for failAlg is NG_ONE2MANY_FAIL_MANUAL; this is also the >> default setting." >> >> I have 6.1-BETA1 on a box right now and I've got my config setup for >> NG_ONE2MANY_XMIT_ROUNDROBIN + NG_ONE2MANY_FAIL_NOTIFY and I don't see any >> layer2 heartbeat related traffic (watching via tcpdump(8) on another >> machine in the same segment) >> >> Can you share what you saw? >> >> ~lava >> >>> |> mission critical environment). >>> |> - Xmit-All causes twice as much load on to be placed on the switch >>> |> /fabric and switch CPU. >>> |> >>> | >>> | As of Freebsd 6_0 (which is at RC1 now), the NG_ONE2MANY does >>> | support the failure of a link which does not end up with 50% packet >>> | loss. There is new code in the One2Many module that xmits a layer 2 "I'm >>> | alive" broadcast out all links, as long as this is picked up on the >>> | other links, then all interfaces are considered alive. If one of the >>> | packets is not received, then after 2 x heartbeat duration that link is >>> | considered "down". I have tested this in the 6.0 code and it works with >>> | one caveat. When the server is brought up, both interfaces must be >>> | connected and live, or for some reason, the failure algorithm never >>> | seems to kick in. I saw exactly what you saw in 5.4 and newer with >>> | regards to the 50% packet loss. >>> | >>> | >>> |> What ng_one2many needs is a "Active-Standy" XMIT algorithm (STP BOFH's >>> |> will think BLOCKING/FORWARDING). It could even be used on top of >>> |> other NetGraph nodes like ng_fec or possibly (hopefully) ng_802.3ad >:} >>> |> >>> | >>> >>> - -- >>> Brian J. Creasy >>> Collaborative Fusion, Inc. >>> 412.422.3463 x4020 bcreasy@collaborativefusion.com >>> >>> pgp public key: >>> ~ http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x5F94E004 >>> >>> **************************************************************** >>> IMPORTANT: This message contains confidential information >>> and is intended only for the individual named. If the reader of >>> this message is not an intended recipient (or the individual >>> responsible for the delivery of this message to an intended >>> recipient), please be advised that any re-use, dissemination, >>> distribution or copying of this message is prohibited. Please >>> notify the sender immediately by e-mail if you have received >>> this e-mail by mistake and delete this e-mail from your system. >>> E-mail transmission cannot be guaranteed to be secure or >>> error-free as information could be intercepted, corrupted, lost, >>> destroyed, arrive late or incomplete, or contain viruses. The >>> sender therefore does not accept liability for any errors or >>> omissions in the contents of this message, which arise as a >>> result of e-mail transmission. >>> **************************************************************** >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2 (FreeBSD) >>> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org >>> >>> iD8DBQFDxmXvDgwDm1+U4AQRAr3GAJ42+HcJFO595aZvljztWCkd+NWgvACeMQiu >>> ILXLchBGR90TZTZHjn6DVCY= >>> =68DY >>> -----END PGP SIGNATURE----- >>> >> >> l8* >> -lava >> >> x.25 - minix - bitnet - plan9 - 110 bps - ASR 33 - base8 >> > > Thanks, > Jonathan > ------------------------------------------------------------- > Jonathan Donaldson > Technical Lead > > Cisco Systems - CV2BU > 4690 E. Fulton St C-210 > Ada, MI 49301 > > Office: +1-972-813-5251 > Cell: +1-616-301-4277 > eMail: donaldson@cisco.com > > l8* -lava x.25 - minix - bitnet - plan9 - 110 bps - ASR 33 - base8