Date: Mon, 5 Jun 2006 21:18:07 -0400 From: "Scott Ullrich" <sullrich@gmail.com> To: "David DeSimone" <fox@verio.net> Cc: freebsd-pf@freebsd.org Subject: Re: pfsync after reboot does not synchronize Message-ID: <d5992baf0606051818y249b759fp25d5f4b77311c2ef@mail.gmail.com> In-Reply-To: <20060605234031.GA4787@verio.net> References: <20060605234031.GA4787@verio.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6/5/06, David DeSimone <fox@verio.net> wrote: > I tried posting some messages about PF to the freebsd-net mailing list, > but they seemed to be ignored. So I thought I would try sending my > questions here. > > I am trying to figure out why pfsync does not seem to work correctly > when one of my cluster nodes reboots. > > When I reboot one of the cluster members, the state tables do appear to > synchronize, sort of, and populate with some of the same connection > states, but not all of them. > > That is "pfctl -ss" on both cluster members will show a different number > of state entries. Vastly different if the new member has only been up > for a minute or two. > > In particular, long-lived, extant connections (such as IRC server > connections) seem to never show up in the rebooted member's state table, > even though the connections continue to update their state on the > current carp master. > > I figured that doing ifconfig down/up would send some sort of "full > sync" message between the two members, to cause the entire state table > to be sent in bulk. Eventually I learned that the method to do this is > to use "ifconfig syncdev" to force a bulk update: > > ifconfig pfsync0 syncdev fxp0 # $pfsync_syncdev > > When I perform the above command, I see the following debug output (when > PF is configured at "misc" or "loud" debug level): > > On the cluster member receiving the requests: > > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > pfsync: received bulk update request > > On the cluster member making the request (where syncdev was just > ifconfig'd): > > pfsync: requesting bulk update > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: received bulk update start > pfsync: failed to receive bulk update status > > After performing this manual action, I find the state table is much > better populated, and the two firewalls appear to be synchronized. > However, the messages above bother me. It looks to me like the cluster > member making the request repeats it over and over again, and finally > gives up after PFSYNC_MAX_BULKTRIES (12) attempts. Shouldn't that be > something that only happens in exceptional conditions? Yet, I can make > it happen every time, even on a test cluster with no traffic (and thus > an almost empty state table). > > Does anyone have any insight as to why I see these problems? > > 1. Why does pfsync synchronize the state tables when I use the > "ifconfig syncdev" trick to force a bulk update, yet it does > not do this when the system is booting up? > > 2. Why does pfsync keep repeating the bulk update request and then give > up? What message is not getting through? > > > The two cluster members have a direct cross-cable between them. My PF > policy has these settings: > > set skip on pfsync0 > > pass quick on fxp0 proto pfsync # $pfsync_syncdev I have also seen this problem with pfSense. To get around the problem I set the advskew to 200 on the host and wait 30 seconds to give everything time to sync. I am really not sure what is causing it but it may be related to the pfsync hold down timer? At any rate we worked around the problem and I wanted to readdress it after our 1.0 release. I am glad someone else is also seeing the problem. Let me know if anyone needs more information. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d5992baf0606051818y249b759fp25d5f4b77311c2ef>