Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2007 10:04:23 +0400
From:      Eygene Ryabinkin <rea-fbsd@codelabs.ru>
To:        Mike Silbersack <silby@silby.com>
Cc:        Andre Oppermann <andre@freebsd.org>, Robert Watson <rwatson@freebsd.org>, current@freebsd.org, net@freebsd.org
Subject:   Re: FreeBSD 7 TCP syncache fix: request for testers
Message-ID:  <20070711060423.GV1038@void.codelabs.ru>
In-Reply-To: <20070710202028.I34890@odysseus.silby.com>
References:  <20070709234401.S29353@odysseus.silby.com> <20070710132253.GJ1038@void.codelabs.ru> <20070710202028.I34890@odysseus.silby.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike, good day.

Tue, Jul 10, 2007 at 08:29:14PM -0500, Mike Silbersack wrote:
> The fact that you're still getting the syncache_expand message tells me that 
> there's another bug which I have not yet fixed still present.
> 
> My suspicion is that the "Segment failed SYNCOOKIE authentication" message is 
> the aftereffect of FreeBSD 7 randomly dropping TCP connections, and not the 
> problem itself.  My theory is that the connection is silently dropped, without
> the other endpoint knowing.  That other endpoint then sends an ACK packet, 
> which is then believed to be a syncookie.  Since it is not, it obviously fails
> the verification.

OK, maybe I have something that can be related to this bug.  It
provokes another message, 'Spurious RST', but can be correlated
with your guess.  What is happening is that when one side closes
the connection and releases the socket (running -CURRENT) while the
other one is still pushing data through the connection, we are
getting 'Spurious RST' messages.  This happens, because we are
checking the 'so->so_state' for the presence of the 'SS_NOFDREF'
flag (tcp_input.c, version 1.361, line 1581) and dropping such
connections with RST.  But the connection was already closed (living
in the FIN-WAIT-2 state, to be precise) from that side, so it
provokes the debug message.

If you're interested, I have the tcpdump trace and the relevant
dmesg output for such a session:
    http://codelabs.ru/fbsd/session-with-close.tar.bz2
It was produced on the lo0 with client connecting to Apache instance
and performing the close() on the socket after some (but not all)
bytes of HTTP reply were received.

> >But the patch received only half a day of testing, so I will continue
> >the tests and will inform you if some other information will be
> >available.  Up to date I don't see problems that had appeared without
> >the patch, but they tend to show up after a midnight ;))
> 
> Thanks for testing,

You're welcome ;))

> I look forward to hearing how things work for you.

My problem, as usual, showed up after midnight -- the sockets
with the weird state:
-----
tcp4       0      0  127.0.0.1.*            127.0.0.1.40001        CLOSED
tcp4       0      0  127.0.0.1.*            127.0.0.1.40001        CLOSED
-----
127.0.0.1:40001 used to be the real connections to the service on
the port 40001, but they lose their port association from the client
side and are stuck in the CLOSED state.  The effect is that I can
not connect to the service listening to 127.0.0.1:40001 anymore.
Only service restart helps.  Perhaps that can give you some clue.
Perhaps not: it may be totally unrelated to the syncache issues :((

This is also documented in the thread
    http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014406.html

Thank you!
-- 
Eygene



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070711060423.GV1038>