From owner-svn-src-head@FreeBSD.ORG Thu Jun 11 03:53:26 2009 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1593A106566C; Thu, 11 Jun 2009 03:53:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 8CB4F8FC0A; Thu, 11 Jun 2009 03:53:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-159-184.carlnfd1.nsw.optusnet.com.au (c122-106-159-184.carlnfd1.nsw.optusnet.com.au [122.106.159.184]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5B3rFBX012716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 11 Jun 2009 13:53:18 +1000 Date: Thu, 11 Jun 2009 13:53:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: John Baldwin In-Reply-To: <200906101827.n5AIRFoR022115@svn.freebsd.org> Message-ID: <20090611124334.A21109@delplex.bde.org> References: <200906101827.n5AIRFoR022115@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r193941 - head/sys/netinet X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jun 2009 03:53:26 -0000 On Wed, 10 Jun 2009, John Baldwin wrote: > Log: > Change a few members of tcpcb that store cached copies of ticks to be ints > instead of unsigned longs. This fixes a few overflow edge cases on 64-bit > platforms. Specifically, if an idle connection receives a packet shortly I think the variables should still be unsigned (ints now). Otherwise there is at best benign undefined behaviour when the variables overflow at INT_MAX, while the code is apparently designed to work with unsigned values (it casts to int to look at differences between the unsigned values). > before 2^31 clock ticks of uptime (about 25 days with hz=1000) and the keep > alive timer fires after 2^31 clock ticks, the keep alive timer will think > that the connection has been idle for a very long time and will immediately > drop the connection instead of sending a keep alive probe. > > Reviewed by: silby, gnn, lstewart > MFC after: 1 week > > Modified: > head/sys/netinet/tcp_input.c > head/sys/netinet/tcp_usrreq.c > head/sys/netinet/tcp_var.h > > Modified: head/sys/netinet/tcp_input.c > ============================================================================== > --- head/sys/netinet/tcp_input.c Wed Jun 10 18:26:02 2009 (r193940) > +++ head/sys/netinet/tcp_input.c Wed Jun 10 18:27:15 2009 (r193941) > @@ -1778,7 +1778,7 @@ tcp_do_segment(struct mbuf *m, struct tc > TSTMP_LT(to.to_tsval, tp->ts_recent)) { > > /* Check to see if ts_recent is over 24 days old. */ > - if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) { > + if ((ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) { The variables are now ints, and there is now also overflow in the subtraction. E.g., INT_MAX - INT_MIN now overflows. On 2's complement machines it normally overflows to -2, which is probably the wrong value (the large unsigned value 0xFFF...FEU is probably correct), but it is the same value as is given by casting the difference of unsigned values to int ((int)0xFFF...FEU = -2). This is because, when representing times mod UINT_MAX+1, all time differences up to UINT_MAX can be represented, but only if we keep track of which operand is older (here we probably do know that ts_recent_age is older); if we don't keep track then we normally assume that (unsigned) differences smaller than INT_MAX mean that the difference is nonnegative while differences larger than INT_MAX mean that the difference is negative. The signed interpreatation works well if we know that the nonnegative differences never exceed INT_MAX. Style bug: without the cast, the above has excessive parentheses. > /* > * Invalidate ts_recent. If this segment updates > * ts_recent, the age will be reset later and ts_recent > > Modified: head/sys/netinet/tcp_usrreq.c > ============================================================================== > --- head/sys/netinet/tcp_usrreq.c Wed Jun 10 18:26:02 2009 (r193940) > +++ head/sys/netinet/tcp_usrreq.c Wed Jun 10 18:27:15 2009 (r193941) > @@ -1823,7 +1823,7 @@ db_print_tcpcb(struct tcpcb *tp, const c > tp->snd_recover); > > db_print_indent(indent); > - db_printf("t_maxopd: %u t_rcvtime: %lu t_startime: %lu\n", > + db_printf("t_maxopd: %u t_rcvtime: %u t_startime: %u\n", > tp->t_maxopd, tp->t_rcvtime, tp->t_starttime); > > db_print_indent(indent); You still print all the times as unsigned. Since the variables are now signed, this is now a printf format error, which gcc still doesn't detect but I usually do. > Modified: head/sys/netinet/tcp_var.h > ============================================================================== > --- head/sys/netinet/tcp_var.h Wed Jun 10 18:26:02 2009 (r193940) > +++ head/sys/netinet/tcp_var.h Wed Jun 10 18:27:15 2009 (r193941) > @@ -139,8 +139,8 @@ struct tcpcb { > > u_int t_maxopd; /* mss plus options */ > > - u_long t_rcvtime; /* inactivity time */ > - u_long t_starttime; /* time connection was established */ > + int t_rcvtime; /* inactivity time */ > + int t_starttime; /* time connection was established */ > int t_rtttime; /* round trip time */ > tcp_seq t_rtseq; /* sequence number being timed */ > > @@ -167,7 +167,7 @@ struct tcpcb { > u_char rcv_scale; /* window scaling for recv window */ > u_char request_r_scale; /* pending window scaling */ > u_int32_t ts_recent; /* timestamp echo data */ > - u_long ts_recent_age; /* when last updated */ > + int ts_recent_age; /* when last updated */ > u_int32_t ts_offset; /* our timestamp offset */ > > tcp_seq last_ack_sent; > @@ -175,7 +175,7 @@ struct tcpcb { > u_long snd_cwnd_prev; /* cwnd prior to retransmit */ > u_long snd_ssthresh_prev; /* ssthresh prior to retransmit */ > tcp_seq snd_recover_prev; /* snd_recover prior to retransmit */ > - u_long t_badrxtwin; /* window for retransmit recovery */ > + int t_badrxtwin; /* window for retransmit recovery */ > u_char snd_limited; /* segments limited transmitted */ > /* SACK related state */ > int snd_numholes; /* number of holes seen by sender */ > Should all be changed to u_int? There still seem to be some very nice sign extension/overflow bugs. E.g., `ticks' is (bogusly) signed. After changing the above variables to int to be bug for bug compatible with `ticks', the semantics of expressions not directly touched in this commit is changed too. E.g., there is the expression `ticks < tp->t_badrxtwin'. Changing the signedness of the type of tp_t_badrxtwin stops promotion of `ticks' to unsigned in this expression. However, this expression seems to have been just broken before, and the change only moves the bug slightly (from times near where `ticks' wraps around at to to times near where `ticks' overflows at INT_MAX). To handle wraparound and avoid overflow, such expressions should be written as (int)(ticks - tp->t_badrxtwin) < 0 where the variables have unsigned type. This seems to have already been done for all the other variables changed in this commit, except the cast to int is missing in some cases. Wrapararound used to occur after 497 days when HZ was 100 and variables were unsigned 32 bits. Now overflow occurs after 24 days when HZ is 1000 and variables are signed 32 bits. Larger misconfigurations of HZ for polling can make the bugs occur after only a week of uptime. It might be a practical exercise to change the type of the time variables to uint16_t so that the wraparound bugs are seen after only a few seconds of uptime and type size mismatches with ints through u_longs affect all machines. The type size mismatches shouldn't matter if times are always clamped compatibly. E.g., expressions like `tp->t_badrxtwin = ticks + adj' depend on t_badrxtwin wrapping around in the same way as `ticks'. This could be written as something like `tp->t_badrxtwin = (ticks + adj) & 0xFFFF' to get compatible wrapping with 16-bit values. Bruce