Date: Wed, 12 Jan 2022 17:01:51 -0500 From: George Neville-Neil <gnn@neville-neil.com> To: Gleb Smirnoff <glebius@freebsd.org> Cc: net@freebsd.org Subject: Re: compressed TIME-WAIT to be decomissioned Message-ID: <C3A1E39F-B8A9-43D9-8813-A96227712B6F@neville-neil.com> In-Reply-To: <Yd8im/VkTU1zdvOi@FreeBSD.org> References: <Yd8im/VkTU1zdvOi@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Removed current@ given your comment below. On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote: > Hi! > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the kernel. > However, we are required to keep it in memory for certain amount of time > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpcb) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KVA > space. Unlike today, they were memory constrained in the first place, not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not able > to use persistent HTTP connections. Those browsers that could, would > recycle connections more often, then today. Default timeouts in Apache > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today. > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biased > here, since I'm looking at servers that do mostly video streaming. I'd > be grateful if anybody replies to this email with some other modern data > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object > much smaller. That made the average send socket buffer size much smaller > than today. Note that TCP socket buffers autosizing came in 2009 only. > This means that today most significant portion of kernel memory consumed > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb to > tcptw we are saving a fraction of a fraction of memory consumed by average > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down three > times doesn't have statistically significant effect on TIME-WAIT use stats. > This means that the already miniscule number of TIME-WAIT connection on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results. The origin of the 2*MSL is pretty old and from a different type of network, but, my understanding of your proposal is not a change to this value anyway, is that correct? The removal of tcptw is a separate issue, if I understand you correctly. > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventually). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS. > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven connection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able to. > So, for non TCP connections memory footprint shrinks (with following changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb == tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 = 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 = 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 = 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes instead > of 424+72. Multiply that by expected number of connections in TIME-WAIT on > your machine. > > Comments welcome. This all seems fine and I'm interested to see the proposed patch. Even the smallest embedded machines that FreeBSD runs on without modification (i.e. just install/run) have plenty of memory at this point. If someone really wants to create a very small, FreeBSD based, web server then they'll care but they can probably come up with another way to handle their memory needs. Best, George
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3A1E39F-B8A9-43D9-8813-A96227712B6F>
