FreeBSD Mail Archives

Date:      Mon, 5 Jun 2006 21:18:07 -0400
From:      "Scott Ullrich" <sullrich@gmail.com>
To:        "David DeSimone" <fox@verio.net>
Cc:        freebsd-pf@freebsd.org
Subject:   Re: pfsync after reboot does not synchronize
Message-ID:  <d5992baf0606051818y249b759fp25d5f4b77311c2ef@mail.gmail.com>
In-Reply-To: <20060605234031.GA4787@verio.net>
References:  <20060605234031.GA4787@verio.net>

On 6/5/06, David DeSimone <fox@verio.net> wrote:
> I tried posting some messages about PF to the freebsd-net mailing list,
> but they seemed to be ignored.  So I thought I would try sending my
> questions here.
>
> I am trying to figure out why pfsync does not seem to work correctly
> when one of my cluster nodes reboots.
>
> When I reboot one of the cluster members, the state tables do appear to
> synchronize, sort of, and populate with some of the same connection
> states, but not all of them.
>
> That is "pfctl -ss" on both cluster members will show a different number
> of state entries.  Vastly different if the new member has only been up
> for a minute or two.
>
> In particular, long-lived, extant connections (such as IRC server
> connections) seem to never show up in the rebooted member's state table,
> even though the connections continue to update their state on the
> current carp master.
>
> I figured that doing ifconfig down/up would send some sort of "full
> sync" message between the two members, to cause the entire state table
> to be sent in bulk.  Eventually I learned that the method to do this is
> to use "ifconfig syncdev" to force a bulk update:
>
>     ifconfig pfsync0 syncdev fxp0   # $pfsync_syncdev
>
> When I perform the above command, I see the following debug output (when
> PF is configured at "misc" or "loud" debug level):
>
>     On the cluster member receiving the requests:
>
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>         pfsync: received bulk update request
>
>     On the cluster member making the request (where syncdev was just
>     ifconfig'd):
>
>         pfsync: requesting bulk update
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: received bulk update start
>         pfsync: failed to receive bulk update status
>
> After performing this manual action, I find the state table is much
> better populated, and the two firewalls appear to be synchronized.
> However, the messages above bother me.  It looks to me like the cluster
> member making the request repeats it over and over again, and finally
> gives up after PFSYNC_MAX_BULKTRIES (12) attempts.  Shouldn't that be
> something that only happens in exceptional conditions?  Yet, I can make
> it happen every time, even on a test cluster with no traffic (and thus
> an almost empty state table).
>
> Does anyone have any insight as to why I see these problems?
>
> 1.  Why does pfsync synchronize the state tables when I use the
>     "ifconfig syncdev" trick to force a bulk update, yet it does
>     not do this when the system is booting up?
>
> 2.  Why does pfsync keep repeating the bulk update request and then give
>     up?  What message is not getting through?
>
>
> The two cluster members have a direct cross-cable between them.  My PF
> policy has these settings:
>
>     set skip on pfsync0
>
>     pass quick on fxp0 proto pfsync     # $pfsync_syncdev

I have also seen this problem with pfSense.  To get around the problem
I set the advskew to 200 on the host and wait 30 seconds to give
everything time to sync.  I am really not sure what is causing it but
it may be related to the pfsync hold down timer?   At any rate we
worked around the problem and I wanted to readdress it after our 1.0
release.  I am glad someone else is also seeing the problem.

Let me know if anyone needs more information.

Scott

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d5992baf0606051818y249b759fp25d5f4b77311c2ef>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation