From owner-freebsd-net@FreeBSD.ORG Wed Jan 14 14:04:42 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5AB816A4CE; Wed, 14 Jan 2004 14:04:42 -0800 (PST) Received: from NetworkPhysics.COM (fw.networkphysics.com [205.158.104.176]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2542843D46; Wed, 14 Jan 2004 14:04:39 -0800 (PST) (envelope-from pavel@NetworkPhysics.COM) Received: from NetworkPhysics.COM (gt500.fractal.networkphysics.com [10.10.0.192]) by NetworkPhysics.COM (8.12.10/8.12.10) with ESMTP id i0EM4JQX087048; Wed, 14 Jan 2004 14:04:30 -0800 (PST) (envelope-from pavel@NetworkPhysics.COM) Message-Id: <200401142204.i0EM4JQX087048@NetworkPhysics.COM> To: richard@wendland.org.uk In-reply-to: Your message of "Wed, 14 Jan 2004 12:50:34 GMT." <200401141250.MAA00294@starburst.demon.co.uk> User-Agent: EMH/1.10.0 SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.3 Emacs/21.2 (i386--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Date: Wed, 14 Jan 2004 14:04:19 -0800 From: Tom Pavel cc: freebsd-isp@freebsd.org cc: freebsd-net@freebsd.org cc: Adrian Penisoara cc: sten.daniel.sorsdal@wan.no Subject: Re: Handling 100.000 packets/sec or more X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: pavel@alum.mit.edu List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jan 2004 22:04:42 -0000 >>>>> On Wed, 14 Jan 2004, Richard Wendland wri tes: > > device polling(8) really does help _alot_ for packet floods/storms. > > for device polling to work properly (imho) you would need to set HZ > > to 1000. > > I dont recommend any higher HZ on a PIII. > > Incidentally, setting HZ > 1000 would cause FreeBSD TCP to not comply > with RFC1323, as it would make the TCP timestamp option clock tick faster > than 1ms. RFC1323 4.2.2 specifies the clock rate to be in the range > 1 ms to 1 sec per tick. > > Really the TCP timestamp option clock should be divorced from HZ before > too long, as a time will come when people will want HZ > 1000. > > Actually a bit faster tick-rate is unlikely to run into much trouble in > practice, but it will cause the PAWS algorithm to stop a long running > TCP connection, see 4.2.3 of RFC1323. > > Richard The PAWS thing is real. Idle SSH or telnet connections can easily get hosed by wraparound if you crank up HZ too much. We encountered this at Network Physics. I had been meaning to submit a PR about this (and probably several others as well) for quite a while now, but I always got distracted by some other urgent matter... However, given the prod, I was able to dig up the fix we used for this particular problem. Pretty sure these diffs will not apply cleanly, even to -stable, but no doubt the gist of the idea should be clear enough. Hopefully, this can save someone some work on getting a fix into the tree. Tom Pavel Network Physics pavel@networkphysics.com / pavel@alum.mit.edu Index: tcp_input.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_input.c,v retrieving revision 1.41 retrieving revision 1.42 diff -u -r1.41 -r1.42 --- tcp_input.c 2 Apr 2002 23:27:33 -0000 1.41 +++ tcp_input.c 3 Apr 2002 22:24:24 -0000 1.42 @@ -1185,7 +1185,7 @@ */ if ((to.to_flag & TOF_TS) != 0 && SEQ_LEQ(th->th_seq, tp->last_ack_sent)) { - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); tp->ts_recent = to.to_tsval; } @@ -1228,9 +1228,12 @@ && ((!(sack_check(tp))) || to.to_tsecr) #endif - ) - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1); - else { + ) { + u_long cur_ts, rtt_ticks; + GETCURTS(cur_ts); + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr); + tcp_xmit_timer(tp, rtt_ticks + 1); + } else { #ifdef LTSTMP tcp_xmit_timer(tp, tp->t_rtttime); #else @@ -1941,9 +1944,11 @@ */ if ((to.to_flag & TOF_TS) != 0 && tp->ts_recent && TSTMP_LT(to.to_tsval, tp->ts_recent)) { + u_long cur_ts; /* Check to see if ts_recent is over 24 days old. */ - if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) { + GETCURTS(cur_ts); + if ((int)(cur_ts - tp->ts_recent_age) > TCP_PAWS_IDLE) { /* * Invalidate ts_recent. If this segment updates * ts_recent, the age will be reset later and ts_recent @@ -2120,7 +2125,7 @@ */ if ((to.to_flag & TOF_TS) != 0 && SEQ_LEQ(th->th_seq, tp->last_ack_sent)) { - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); tp->ts_recent = to.to_tsval; } @@ -2754,9 +2759,12 @@ /* bug fix from Mark Allman */ && ((!sack_check(tp)) || to.to_tsecr) #endif - ) - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1); - else { + ) { + u_long cur_ts, rtt_ticks; + GETCURTS(cur_ts); + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr); + tcp_xmit_timer(tp, rtt_ticks + 1); + } else { #ifdef LTSTMP /* use local timestamp */ tcp_xmit_timer(tp, tp->t_rtttime); @@ -3293,7 +3301,7 @@ if (th->th_flags & TH_SYN) { tp->t_flags |= TF_RCVD_TSTMP; tp->ts_recent = to->to_tsval; - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); } break; Index: tcp_output.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_output.c,v retrieving revision 1.32 retrieving revision 1.33 diff -u -r1.32 -r1.33 --- tcp_output.c 3 Apr 2002 01:55:20 -0000 1.32 +++ tcp_output.c 3 Apr 2002 22:24:24 -0000 1.33 @@ -616,7 +616,8 @@ /* Form timestamp option as shown in appendix A of RFC 1323. */ *lp++ = htonl(TCPOPT_TSTAMP_HDR); - *lp++ = htonl(ticks); + GETCURTS(*lp); + *lp++ = htonl(*lp); *lp = htonl(tp->ts_recent); optlen += TCPOLEN_TSTAMP_APPA; } Index: tcp_seq.h =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_seq.h,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- tcp_seq.h 16 Jul 2001 18:18:44 -0000 1.2 +++ tcp_seq.h 3 Apr 2002 22:24:24 -0000 1.3 @@ -88,8 +88,19 @@ (tp)->iss #endif -#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * hz) - /* timestamp wrap-around time */ +/* clock macros for RFC1323 timestamps */ +#define TSTMP_UNITS (10) /* in ms (RFC1323 says 1-1000 ms) */ +#define GETCURTS(ts) \ + do { \ + struct timeval tv; \ + getmicrouptime(&tv); \ + (ts) = (u_long)tv.tv_sec * 1000 + tv.tv_usec / 1000; \ + (ts) /= TSTMP_UNITS; \ + } while (0) +#define TSTMPTOTICK(ts) (((int64_t)(ts))*hz*TSTMP_UNITS/1000) + +#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * 1000/TSTMP_UNITS) + /* timestamp wrap-around time (24 days in 10ms units) */ #ifdef _KERNEL extern tcp_cc tcp_ccgen; /* global connection count */