Date: Sat, 8 Jul 2006 10:43:43 +0200 From: Daniel Hartmeier <daniel@benzedrine.cx> To: "Douglas K. Rand" <rand@meridian-enviro.com> Cc: mcbride@openbsd.org, freebsd-pf@freebsd.org Subject: Re: pfsync & carp problems Message-ID: <20060708084343.GA32262@insomnia.benzedrine.cx> In-Reply-To: <87zmfl466d.fsf@delta.meridian-enviro.com> References: <87ejwx1edf.wl%rand@meridian-enviro.com> <87zmfl466d.fsf@delta.meridian-enviro.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 07, 2006 at 01:32:26PM -0500, Douglas K. Rand wrote: > Some more information after I discovered the -x loud option to > pfctl. When the master firewall goes down and the already established > TCP session hangs, I get these messages on the slave: > > pf: BAD state: TCP 67.134.74.224:52173 67.134.74.224:52173 204.152.184.134:80 [lo=2943781408 high=2943846943 win=33304 modulator=0 wscale=1] [lo=3255565389 high=3255629101 win=65535 modulator=0 wscale=0] 4:4 A seq=3255634893 ack=2943781408 len=1448 ackskew=0 pkts=21109:24835 dir=in,rev > pf: State failure on: 1 | This means the web server is trying to send data to the client that is out of (what pf thinks is legal for) its window. The last ACK from the client that pf's state saw was 3255562493 (advertising th_win 33304 wscale factor 2^1), hence the upper boundary of what the client accepts is 3255562493 + 2*33304 == seqhi 3255629101. The packet's end, th_seq 3255634893 + len 1448 == 3255636341 is larger than the client's seqhi 3255629101 (by 7240, which is 5*1448). Hence it is blocked. The fact that the server retransmits the same segment over and over without going back to older segments probably means that it has gotten an ACK from the client for 3255634893. So how can the server have received an ACK up to 3255634893 when pf's state has only seen an ACK for 3255562493? I guess this depends on how you shut down the master in the first place. For instance, if its kernel would, for a brief period of time, continue to forward packets while pf is no longer seeing packets, this would be possible. Also, there's a certain latency between pf updating its state entry based on a passing packet and pfsync actually transmitting that update to the slave. If an update was lost because the box was shutting down precisely in that moment, I guess there is a chance for such a race. How are you disconnecting the master? Does this occur when you physically disconnect the ethernet cable towards the server first? I'm not sure if there's any code that should try to prevent this scenario in a normal shutdown/reboot case (like disabling forwarding or taking down interfaces in a certain order first). Ryan, do we address this, or is it just a rare but expected case that this might occur? Or did I miss anything and this shouldn't occur for some reason? Daniel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060708084343.GA32262>