Date: Wed, 9 Jan 2013 21:12:53 +0400 From: Gleb Smirnoff <glebius@FreeBSD.org> To: Steve Read <steve.read@netasq.com> Cc: freebsd-net@FreeBSD.org, andre@FreeBSD.org, rwatson@FreeBSD.org, Vijay Singh <vijju.singh@gmail.com> Subject: Re: panics in soabort with so_count != 0, one possible solution to one cause. Message-ID: <20130109171253.GO66284@FreeBSD.org> In-Reply-To: <50ED8B27.8070708@netasq.com> References: <50ED8B27.8070708@netasq.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 09, 2013 at 04:22:15PM +0100, Steve Read wrote: S> Context for this message: S> http://www.freebsd.org/cgi/query-pr.cgi?pr=145825&cat=kern S> <http://www.freebsd.org/cgi/query-pr.cgi?pr=145825&cat=kern> S> kern/145825: [panic] panic: soabort: so_count S> S> AND S> S> http://www.freebsd.org/cgi/query-pr.cgi?pr=159621 S> kern/159621: [tcp] [panic] panic: soabort: so_count S> S> The two PRs are essentially reporting the same thing, and I have seen S> evidence of people reporting this panic against kernels as old as 6.2. S> S> == Scenario == S> The basic scenario is: S> 1. There is a local listening TCP socket. A userland thread is waiting S> on a kqueue, and will eventually call accept() on this socket. S> 2. A new TCP connection arrives that matches this TCP socket. Syncache S> hangs on to the connection until the three-way handshake is complete S> (i.e. the ACK arrives). S> 3. At this point, syncache_socket() calls sonewconn() and passes S> SS_ISCONNECTED. sonewconn() as a result hands the new socket off to the S> accept queue and wakes up the userland thread (marks the listening S> socket "readable", sends a kqueue notification, etc.). S> 4. Something goes wrong during the rest of syncache_socket(), as a S> result of which it calls soabort(). S> S> == Consequence == S> On a single-CPU machine, the netisr thread that called syncache_socket() S> blocks out the userland thread until it has finished, so so_count of the S> new connected socket is still zero when syncache_socket() calls S> soabort(). (It's not absolutely guaranteed, as there are calls to S> locking functions along the way, but it usually happens.) S> S> On a multi-CPU machine of any sort, the userland thread resumes S> immediately that it is woken up, and it is possible (but not guaranteed) S> for it to grab the socket and increment its so_count before S> syncache_socket() calls soabort(). S> S> I have a core which shows the netisr thread hitting the panic in S> soabort(), while the expected userland thread (on a different CPU) is S> still in the kernel, churning through the post-pickup part of accept(). S> S> == Proposed solution == S> My proposed solution to this issue is: S> 1. Replace SS_ISCONNECTED with 0 in the call to sonewconn() to prevent S> it from waking up the listening thread. S> 2. At the "end" of syncache_socket(), call soisconnected(), passing the S> new socket. This will issue the wakeup after syncache_socket() has S> finished preparing itself, and in particular after the last possible S> call to soabort(). S> S> I'm concerned, of course, that this may cause some unobvious fallout S> somewhere, but I can't see for the moment what it would be. Any advice S> would be welcome. S> S> == Patch that applies the proposed solution == S> A patch that would apply to kernel 8.3 (the basic scenario appears to S> still be feasible with HEAD, and the code is very similar): S> S> ====== S> --- netinet/tcp_syncache.c.orig 2013-01-09 13:18:05.000000000 +0000 S> +++ netinet/tcp_syncache.c 2013-01-09 14:03:54.000000000 +0000 S> @@ -638,7 +638,7 @@ S> * connection when the SYN arrived. If we can't create S> * the connection, abort it. S> */ S> - so = sonewconn(lso, SS_ISCONNECTED); S> + so = sonewconn(lso, 0); S> if (so == NULL) { S> /* S> * Drop the connection; we will either send a RST or S> @@ -831,6 +831,8 @@ S> S> INP_WUNLOCK(inp); S> S> + soisconnected(so); S> + S> TCPSTAT_INC(tcps_accepts); S> return (so); AFAIU, in head this race was fixed by r243627. Can Vijay and Andre comment? -- Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130109171253.GO66284>