From owner-freebsd-stable@FreeBSD.ORG Thu Sep 29 12:57:53 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69B63106566B for ; Thu, 29 Sep 2011 12:57:53 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 336758FC08 for ; Thu, 29 Sep 2011 12:57:52 +0000 (UTC) Received: by yia13 with SMTP id 13so634496yia.13 for ; Thu, 29 Sep 2011 05:57:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.146.53.36 with SMTP id b36mr9632129yaa.9.1317301071802; Thu, 29 Sep 2011 05:57:51 -0700 (PDT) Received: by 10.146.86.8 with HTTP; Thu, 29 Sep 2011 05:57:51 -0700 (PDT) In-Reply-To: <4E84627B.2050609@my.gd> References: <4E71C059.5060404@hi-media.com> <4E84627B.2050609@my.gd> Date: Thu, 29 Sep 2011 14:57:51 +0200 Message-ID: From: Damien Fleuriot To: "freebsd-stable@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: CARP interfaces and mastership issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Sep 2011 12:57:53 -0000 On 29 September 2011 14:20, Damien Fleuriot wrote: > > > On 9/15/11 11:07 AM, Damien FLEURIOT wrote: >> Hello list, >> >> >> >> >> TLDR: carp interface becomes MASTER for a split second after being >> created, even if another MASTER exists on the network with faster >> advertisements. Breaks connections. HOWTO prevent ? >> >> >> >> >> We've been experiencing this double mastership problem with CARP interfaces. >> >> >> Allow me to put some context here: >> >> 2 firewalls, PF1, PF2, each with 2 VLANs (for example, some have more) >> on a lagg device (link aggregation). >> These firewalls then share virtual IPs through CARP interfaces, let us >> assume the following: >> >> PF1: >> - vlan13 >> - vlan410 >> - carp13 (advskew 50) >> - carp410 (advskew 50) >> >> PF2: >> - vlan13 >> - vlan410 >> - carp13 (advskew 100) >> - carp410 (advskew 100) >> >> CARP preemption is turned on, so that if vlan13 should fail on PF1, PF2 >> would assume mastership on both CARP interfaces. >> Syscontrols below: >> net.inet.carp.allow: 1 >> net.inet.carp.preempt: 1 >> net.inet.carp.log: 1 >> net.inet.carp.arpbalance: 0 >> net.inet.carp.suppress_preempt: 0 >> >> >> The problem we have is, say for example we reboot PF2. >> When it comes back up, it will, even for a split second, assume CARP >> mastership for its interfaces, at the same time as PF1. >> >> This breaks existing sessions, openvpn tunnels and new client connections. >> >> While I acknowledge the home-made demons should be built to support tiny >> network outages, this doesn't solve our main problem. >> >> >> >> >> >> We have the same issue when destroying/creating said CARP interfaces. >> >> Recently we upgraded some switches' IOS version on our backup datacenter >> (which also has 2 PF boxes, sharing the CARP IPs with the 2 PFs on our >> production DC). >> To prevent anything nasty happening, we forbade production VLANs on the >> switches' uplink ports and only allowed management traffic to allow us >> to perform the upgrade. >> >> Things went smoothly but when we brought the production VLANs up again >> at layer 2 on the switches, when spanning-tree converged we had again a >> double MASTER problem. >> >> I understand I could have avoided it by destroying/recreating the CARP >> interfaces, but even in this case there is a split second during which >> both firewalls are CARP MASTER. >> >> >> >> >> Is there any way to force CARP to assume INIT state for some time when >> coming up, and only after X seconds either become MASTER or BACKUP ? >> >> Any other idea how to solve this, guys ? >> >> >> >> _______________________________________________ >> freebsd-pf@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-pf >> To unsubscribe, send any mail to "freebsd-pf-unsubscribe@freebsd.org" > > > > > Hello List, > > > > This is a follow-up to my original email quoted above. > > > > > It seems that there is an existing bug in OpenBSD 3.8 and lower's CARP > implementation which causes CARP interfaces to skip the INIT state > altogether and start as MASTER if preempt is enabled. > > Source: > https://calomel.org/pf_carp.html > > Quote: > INIT : All CARP interfaces start in this state. Also, when a CARP > interface is admin down, i.e. "ifconfig em0 down", it is put into this > state. When a CARP interface is admin up, it immediately transitions to > BACKUP. Note that in OpenBSD 3.8 and earlier, a bug exists which will > cause the host to transition to MASTER right away if preempt is enabled. > > > I have been able to verify and reproduce this behavior on boxes running > both 8.1 and 8.2 FreeBSD. > > > > > Does anyone know what version of OpenBSD's CARP implementation we're > running on FreeBSD 8.x ? > > It seems like this is the same bug, to me. > Quick follow-up again. This is the code for sys/netinet/ip_carp.c on FreeBSD 8.2, OpenBSD 3.8, OpenBSD 3.9 in function carp_setrun(struct carp_softc *sc, sa_family_t af) FREEBSD 8.2-PRERELEASE with init + preempt => auto MASTER bug Function starts at line 1371. --- switch (sc->sc_state) { case INIT: if (carp_opts[CARPCTL_PREEMPT] && !carp_suppress_preempt) { carp_send_ad_locked(sc); carp_send_arp(sc); #ifdef INET6 carp_send_na(sc); #endif /* INET6 */ CARP_LOG("%s: INIT -> MASTER (preempting)\n", SC2IFP(sc)->if_xname); carp_set_state(sc, MASTER); carp_setroute(sc, RTM_ADD); } else { CARP_LOG("%s: INIT -> BACKUP\n", SC2IFP(sc)->if_xname); carp_set_state(sc, BACKUP); carp_setroute(sc, RTM_DELETE); carp_setrun(sc, 0); } break; --- OPENBSD 3.8 with init + preempt => auto MASTER bug Function starts at line 1293. --- case INIT: if (carp_opts[CARPCTL_PREEMPT] && !carp_suppress_preempt) { carp_set_state(sc, MASTER); carp_setroute(sc, RTM_ADD); carp_send_ad(sc); carp_send_arp(sc); #ifdef INET6 carp_send_na(sc); #endif /* INET6 */ } else { carp_set_state(sc, BACKUP); carp_setroute(sc, RTM_DELETE); carp_setrun(sc, 0); } break; --- OPENBSD 3.9 with bug fixed Function starts at line 1348. --- switch (sc->sc_state) { case INIT: carp_set_state(sc, BACKUP); carp_setroute(sc, RTM_DELETE); carp_setrun(sc, 0); break; --- It looks like the root cause is there. I'll rebuild and test, keep you updated.