Date: Wed, 12 Jan 2022 17:01:51 -0500 From: George Neville-Neil <gnn@neville-neil.com> To: Gleb Smirnoff <glebius@freebsd.org> Cc: net@freebsd.org Subject: Re: compressed TIME-WAIT to be decomissioned Message-ID: <C3A1E39F-B8A9-43D9-8813-A96227712B6F@neville-neil.com> In-Reply-To: <Yd8im/VkTU1zdvOi@FreeBSD.org> References: <Yd8im/VkTU1zdvOi@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Removed current@ given your comment below. On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote: > Hi! > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and= > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the ker= nel. > However, we are required to keep it in memory for certain amount of tim= e > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpc= b) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KV= A > space. Unlike today, they were memory constrained in the first place,= not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not = able > to use persistent HTTP connections. Those browsers that could, would= > recycle connections more often, then today. Default timeouts in Apac= he > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today.= > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biase= d > here, since I'm looking at servers that do mostly video streaming. I'= d > be grateful if anybody replies to this email with some other modern d= ata > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object= > much smaller. That made the average send socket buffer size much sma= ller > than today. Note that TCP socket buffers autosizing came in 2009 onl= y. > This means that today most significant portion of kernel memory consu= med > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb = to > tcptw we are saving a fraction of a fraction of memory consumed by av= erage > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented = a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down thr= ee > times doesn't have statistically significant effect on TIME-WAIT use = stats. > This means that the already miniscule number of TIME-WAIT connection = on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results.= The origin of the 2*MSL is pretty old and from a different type of networ= k, but, my understanding of your proposal is not a change to this value a= nyway, is that correct? The removal of tcptw is a separate issue, if I u= nderstand you correctly. > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as = tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventu= ally). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS= =2E > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven conn= ection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be a= ble to. > So, for non TCP connections memory footprint shrinks (with following = changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memo= ry with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb =3D=3D tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 =3D 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 =3D 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 =3D 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes ins= tead > of 424+72. Multiply that by expected number of connections in TIME-WAIT= on > your machine. > > Comments welcome. This all seems fine and I'm interested to see the proposed patch. Even t= he smallest embedded machines that FreeBSD runs on without modification (= i.e. just install/run) have plenty of memory at this point. If someone r= eally wants to create a very small, FreeBSD based, web server then they'l= l care but they can probably come up with another way to handle their mem= ory needs. Best, George
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3A1E39F-B8A9-43D9-8813-A96227712B6F>