From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 11 08:24:01 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2E047D00 for ; Fri, 11 Jan 2013 08:24:01 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id A32E896A for ; Fri, 11 Jan 2013 08:23:59 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.5/8.14.5) with ESMTP id r0B8NwgM046532; Fri, 11 Jan 2013 12:23:58 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.5/8.14.5/Submit) id r0B8Nww0046531; Fri, 11 Jan 2013 12:23:58 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 11 Jan 2013 12:23:58 +0400 From: Gleb Smirnoff To: Jacques Fourie Subject: Re: Inconsistent TCP state in tcp_do_segment() Message-ID: <20130111082358.GH82815@FreeBSD.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Hackers freeBSD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jan 2013 08:24:01 -0000 Jacques, On Fri, Jan 11, 2013 at 09:18:07AM +0200, Jacques Fourie wrote: J> I've been using the kernel socket API and noticed that every once in a J> while the data received at the remote end of a TCP connection doesn't match J> the data sent locally using sosend(). I tracked it down to the piece of J> code in tcp_do_segment() starting at line 2665: (svn rev 245286) J> J> if (acked > so->so_snd.sb_cc) { J> tp->snd_wnd -= so->so_snd.sb_cc; J> sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc); J> ourfinisacked = 1; J> } else { J> sbdrop_locked(&so->so_snd, acked); J> tp->snd_wnd -= acked; J> ourfinisacked = 0; J> } J> sowwakeup_locked(so); J> /* Detect una wraparound. */ J> if (!IN_RECOVERY(tp->t_flags) && J> SEQ_GT(tp->snd_una, tp->snd_recover) && J> SEQ_LEQ(th->th_ack, tp->snd_recover)) J> tp->snd_recover = th->th_ack - 1; J> /* XXXLAS: Can this be moved up into cc_post_recovery? */ J> if (IN_RECOVERY(tp->t_flags) && J> SEQ_GEQ(th->th_ack, tp->snd_recover)) { J> EXIT_RECOVERY(tp->t_flags); J> } J> tp->snd_una = th->th_ack; J> if (tp->t_flags & TF_SACK_PERMIT) { J> if (SEQ_GT(tp->snd_una, tp->snd_recover)) J> tp->snd_recover = tp->snd_una; J> } J> if (SEQ_LT(tp->snd_nxt, tp->snd_una)) J> tp->snd_nxt = tp->snd_una; J> J> The issue is that sowwakeup_locked() is called before all the tcpcb fields J> are updated to account for the current ACK processing as can be seen from J> the tp->snd_una = th->th_ack in line 2686. My code that uses the kernel J> socket API calls sosend() in the upcall path resulting from J> sowwakeup_locked() which in turn can lead to tcp_output() being called with J> inconsistent TCP state. If I move the call to sowwakeup_locked() to after J> the 'if (SEQ_LT(tp->snd_nxt, tp->snd_una))' line in the code snippet above J> the data corruption issue seems to be fixed. Again I think that your analysis is correct, thanks! I have added Robert Watson to Cc of this email, so that he can review your suggestion. However, it seems to me that it isn't safe to call sosend() from the upcall directly. I suppose that if you run your code with WITNESS it will report lock order reversals. The correct way for your module is to run sosend() in separate context, for example using taskqueue(9) API, or running separate thread for that. -- Totus tuus, Glebius.