Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jan 2013 12:23:58 +0400
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Jacques Fourie <jacques.fourie@gmail.com>
Cc:        Hackers freeBSD <freebsd-hackers@FreeBSD.org>
Subject:   Re: Inconsistent TCP state in tcp_do_segment()
Message-ID:  <20130111082358.GH82815@FreeBSD.org>
In-Reply-To: <CALX0vxAKNMdP6ZvXf0mvk5MKdRL45D82AYBy8Ras-zRDmixa3Q@mail.gmail.com>
References:  <CALX0vxAKNMdP6ZvXf0mvk5MKdRL45D82AYBy8Ras-zRDmixa3Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  Jacques,

On Fri, Jan 11, 2013 at 09:18:07AM +0200, Jacques Fourie wrote:
J> I've been using the kernel socket API and noticed that every once in a
J> while the data received at the remote end of a TCP connection doesn't match
J> the data sent locally using sosend(). I tracked it down to the piece of
J> code in tcp_do_segment() starting at line 2665: (svn rev 245286)
J> 
J>                if (acked > so->so_snd.sb_cc) {
J>                         tp->snd_wnd -= so->so_snd.sb_cc;
J>                         sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc);
J>                         ourfinisacked = 1;
J>                 } else {
J>                         sbdrop_locked(&so->so_snd, acked);
J>                         tp->snd_wnd -= acked;
J>                         ourfinisacked = 0;
J>                 }
J>                 sowwakeup_locked(so);
J>                 /* Detect una wraparound. */
J>                 if (!IN_RECOVERY(tp->t_flags) &&
J>                     SEQ_GT(tp->snd_una, tp->snd_recover) &&
J>                     SEQ_LEQ(th->th_ack, tp->snd_recover))
J>                         tp->snd_recover = th->th_ack - 1;
J>                 /* XXXLAS: Can this be moved up into cc_post_recovery? */
J>                 if (IN_RECOVERY(tp->t_flags) &&
J>                     SEQ_GEQ(th->th_ack, tp->snd_recover)) {
J>                         EXIT_RECOVERY(tp->t_flags);
J>                 }
J>                 tp->snd_una = th->th_ack;
J>                 if (tp->t_flags & TF_SACK_PERMIT) {
J>                         if (SEQ_GT(tp->snd_una, tp->snd_recover))
J>                                 tp->snd_recover = tp->snd_una;
J>                 }
J>                 if (SEQ_LT(tp->snd_nxt, tp->snd_una))
J>                         tp->snd_nxt = tp->snd_una;
J> 
J> The issue is that sowwakeup_locked() is called before all the tcpcb fields
J> are updated to account for the current ACK processing as can be seen from
J> the tp->snd_una = th->th_ack in line 2686. My code that uses the kernel
J> socket API calls sosend() in the upcall path resulting from
J> sowwakeup_locked() which in turn can lead to tcp_output() being called with
J> inconsistent TCP state. If I move the call to sowwakeup_locked() to after
J> the 'if (SEQ_LT(tp->snd_nxt, tp->snd_una))' line in the code snippet above
J> the data corruption issue seems to be fixed.

Again I think that your analysis is correct, thanks! I have added Robert
Watson to Cc of this email, so that he can review your suggestion.

However, it seems to me that it isn't safe to call sosend() from the
upcall directly. I suppose that if you run your code with WITNESS it will
report lock order reversals. The correct way for your module is to run
sosend() in separate context, for example using taskqueue(9) API, or
running separate thread for that.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130111082358.GH82815>