Date: Wed, 14 Jan 2004 14:04:19 -0800 From: Tom Pavel <pavel@NetworkPhysics.COM> To: richard@wendland.org.uk Cc: sten.daniel.sorsdal@wan.no Subject: Re: Handling 100.000 packets/sec or more Message-ID: <200401142204.i0EM4JQX087048@NetworkPhysics.COM> In-Reply-To: Your message of "Wed, 14 Jan 2004 12:50:34 GMT." <200401141250.MAA00294@starburst.demon.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> On Wed, 14 Jan 2004, Richard Wendland <richard@starburst.demon.co.uk> wri tes: > > device polling(8) really does help _alot_ for packet floods/storms. > > for device polling to work properly (imho) you would need to set HZ > > to 1000. > > I dont recommend any higher HZ on a PIII. > > Incidentally, setting HZ > 1000 would cause FreeBSD TCP to not comply > with RFC1323, as it would make the TCP timestamp option clock tick faster > than 1ms. RFC1323 4.2.2 specifies the clock rate to be in the range > 1 ms to 1 sec per tick. > > Really the TCP timestamp option clock should be divorced from HZ before > too long, as a time will come when people will want HZ > 1000. > > Actually a bit faster tick-rate is unlikely to run into much trouble in > practice, but it will cause the PAWS algorithm to stop a long running > TCP connection, see 4.2.3 of RFC1323. > > Richard The PAWS thing is real. Idle SSH or telnet connections can easily get hosed by wraparound if you crank up HZ too much. We encountered this at Network Physics. I had been meaning to submit a PR about this (and probably several others as well) for quite a while now, but I always got distracted by some other urgent matter... However, given the prod, I was able to dig up the fix we used for this particular problem. Pretty sure these diffs will not apply cleanly, even to -stable, but no doubt the gist of the idea should be clear enough. Hopefully, this can save someone some work on getting a fix into the tree. Tom Pavel Network Physics pavel@networkphysics.com / pavel@alum.mit.edu Index: tcp_input.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_input.c,v retrieving revision 1.41 retrieving revision 1.42 diff -u -r1.41 -r1.42 --- tcp_input.c 2 Apr 2002 23:27:33 -0000 1.41 +++ tcp_input.c 3 Apr 2002 22:24:24 -0000 1.42 @@ -1185,7 +1185,7 @@ */ if ((to.to_flag & TOF_TS) != 0 && SEQ_LEQ(th->th_seq, tp->last_ack_sent)) { - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); tp->ts_recent = to.to_tsval; } @@ -1228,9 +1228,12 @@ && ((!(sack_check(tp))) || to.to_tsecr) #endif - ) - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1); - else { + ) { + u_long cur_ts, rtt_ticks; + GETCURTS(cur_ts); + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr); + tcp_xmit_timer(tp, rtt_ticks + 1); + } else { #ifdef LTSTMP tcp_xmit_timer(tp, tp->t_rtttime); #else @@ -1941,9 +1944,11 @@ */ if ((to.to_flag & TOF_TS) != 0 && tp->ts_recent && TSTMP_LT(to.to_tsval, tp->ts_recent)) { + u_long cur_ts; /* Check to see if ts_recent is over 24 days old. */ - if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) { + GETCURTS(cur_ts); + if ((int)(cur_ts - tp->ts_recent_age) > TCP_PAWS_IDLE) { /* * Invalidate ts_recent. If this segment updates * ts_recent, the age will be reset later and ts_recent @@ -2120,7 +2125,7 @@ */ if ((to.to_flag & TOF_TS) != 0 && SEQ_LEQ(th->th_seq, tp->last_ack_sent)) { - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); tp->ts_recent = to.to_tsval; } @@ -2754,9 +2759,12 @@ /* bug fix from Mark Allman */ && ((!sack_check(tp)) || to.to_tsecr) #endif - ) - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1); - else { + ) { + u_long cur_ts, rtt_ticks; + GETCURTS(cur_ts); + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr); + tcp_xmit_timer(tp, rtt_ticks + 1); + } else { #ifdef LTSTMP /* use local timestamp */ tcp_xmit_timer(tp, tp->t_rtttime); @@ -3293,7 +3301,7 @@ if (th->th_flags & TH_SYN) { tp->t_flags |= TF_RCVD_TSTMP; tp->ts_recent = to->to_tsval; - tp->ts_recent_age = ticks; + GETCURTS(tp->ts_recent_age); } break; Index: tcp_output.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_output.c,v retrieving revision 1.32 retrieving revision 1.33 diff -u -r1.32 -r1.33 --- tcp_output.c 3 Apr 2002 01:55:20 -0000 1.32 +++ tcp_output.c 3 Apr 2002 22:24:24 -0000 1.33 @@ -616,7 +616,8 @@ /* Form timestamp option as shown in appendix A of RFC 1323. */ *lp++ = htonl(TCPOPT_TSTAMP_HDR); - *lp++ = htonl(ticks); + GETCURTS(*lp); + *lp++ = htonl(*lp); *lp = htonl(tp->ts_recent); optlen += TCPOLEN_TSTAMP_APPA; } Index: tcp_seq.h =================================================================== RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_seq.h,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- tcp_seq.h 16 Jul 2001 18:18:44 -0000 1.2 +++ tcp_seq.h 3 Apr 2002 22:24:24 -0000 1.3 @@ -88,8 +88,19 @@ (tp)->iss #endif -#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * hz) - /* timestamp wrap-around time */ +/* clock macros for RFC1323 timestamps */ +#define TSTMP_UNITS (10) /* in ms (RFC1323 says 1-1000 ms) */ +#define GETCURTS(ts) \ + do { \ + struct timeval tv; \ + getmicrouptime(&tv); \ + (ts) = (u_long)tv.tv_sec * 1000 + tv.tv_usec / 1000; \ + (ts) /= TSTMP_UNITS; \ + } while (0) +#define TSTMPTOTICK(ts) (((int64_t)(ts))*hz*TSTMP_UNITS/1000) + +#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * 1000/TSTMP_UNITS) + /* timestamp wrap-around time (24 days in 10ms units) */ #ifdef _KERNEL extern tcp_cc tcp_ccgen; /* global connection count */
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200401142204.i0EM4JQX087048>