From nobody Wed Jan 12 22:01:51 2022 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7B5E51951216 for ; Wed, 12 Jan 2022 22:02:02 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [IPv6:2001:4b98:dc4:8::223]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4JZ1kL0pfXz553b; Wed, 12 Jan 2022 22:02:02 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: (Authenticated sender: gnn@neville-neil.com) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 08B0760006; Wed, 12 Jan 2022 22:01:53 +0000 (UTC) From: George Neville-Neil To: Gleb Smirnoff Cc: net@freebsd.org Subject: Re: compressed TIME-WAIT to be decomissioned Date: Wed, 12 Jan 2022 17:01:51 -0500 X-Mailer: MailMate (1.14r5852) Message-ID: In-Reply-To: References: List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4JZ1kL0pfXz553b X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Removed current@ given your comment below. On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote: > Hi! > > [crossposted to current@, but let's keep discussion at net@] > > I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and= > Igor Sysoev (author of nginx). Now posting for wider discussion. > > TLDR: struct tcptw shall be decomissioned > > Longer version covers three topics: why does tcptw exist? why is it no > longer necessary? what would we get removing it? > > Why does struct tcptw exist? > > When TCP connection goes to TIME-WAIT state, it can only retransmit > the very last ACK, thus doesn't need all of the control data in the ker= nel. > However, we are required to keep it in memory for certain amount of tim= e > (2*MSL). So, let's save memory: free the socket, free the tcpcb and > leave only inpcb that will point at small tcptw (much smaller than tcpc= b) > that holds enough info to retransmit the last ACK. This was done in > early 2003, see 340c35de6a2. > > What was different in 2003 compared to 2022? > > * First of all, internet servers were running i386 with only 2 Gb of KV= A > space. Unlike today, they were memory constrained in the first place,= not > CPU bound like they are today. > > * Many of HTTP connections were made by older browsers, which were not = able > to use persistent HTTP connections. Those browsers that could, would= > recycle connections more often, then today. Default timeouts in Apac= he > for persistent connections were short. So, the ratio of connections > in TIME-WAIT compared to live connections was much bigger than today.= > Here is sample data from 2008 provided to me by Igor Sysoev: > > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > tcpcb: 728, 163840, 22938, 72722, 13029632, 0 > tcptw: 88, 163842, 10253, 72949, 2447928, 0 > > We see that TIME-WAITs are ~ 50% of live connections. > > Today I see that TIME-WAITs are ~ 1% of connections. My data is biase= d > here, since I'm looking at servers that do mostly video streaming. I'= d > be grateful if anybody replies to this email with some other modern d= ata > on ratio between tcpcb and tcptw allocations. > > * The Internet bandwidth was lower and thus average size of HTTP object= > much smaller. That made the average send socket buffer size much sma= ller > than today. Note that TCP socket buffers autosizing came in 2009 onl= y. > This means that today most significant portion of kernel memory consu= med > by an average TCP connection is the send socket buffer, and > socket+inpcb+tcpcb is just a fraction of that. Thus, swapping tcpcb = to > tcptw we are saving a fraction of a fraction of memory consumed by av= erage > connection. > > * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT? > In 71d2d5adfe1 I added some stats on usage of tcptw and experimented = a bit > with lowering net.inet.tcp.msl. It appeared that lowering it down thr= ee > times doesn't have statistically significant effect on TIME-WAIT use = stats. > This means that the already miniscule number of TIME-WAIT connection = on a > modern HTTP server can be lowered 3 times more. Feel free to lower > net.inet.tcp.msl and do your own measurements with > 'netstat -sp tcp | grep TIME-WAIT'. I'd be glad to see your results.= The origin of the 2*MSL is pretty old and from a different type of networ= k, but, my understanding of your proposal is not a change to this value a= nyway, is that correct? The removal of tcptw is a separate issue, if I u= nderstand you correctly. > Ok, now what would removal give us? > > * One less alloc/free during socket lifetime (immediately). > * Reduced code complexity. inp->inp_ppcb always can be dereferenced as = tcpcb. > Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventu= ally). > * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS= =2E > Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven conn= ection > may transition to TIME-WAIT, so we can't use tcpcb. Now we would be a= ble to. > So, for non TCP connections memory footprint shrinks (with following = changes). > * Embedding inpcb into protocols cb. An inpcb becomes one piece of memo= ry with > tcpcb. One more less alloc/free during socket lifetime. Reduced code > complexity, since now inpcb =3D=3D tcpb (following changes). > > How much memory are we going to lose? > > (kgdb) p tcpcb_zone->uz_keg->uk_rsize > $5 =3D 1064 > (kgdb) p tcptw_zone->uz_keg->uk_rsize > $6 =3D 72 > (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize > $8 =3D 424 > > After change a connection in TIME-WAIT would consume 424+1064 bytes ins= tead > of 424+72. Multiply that by expected number of connections in TIME-WAIT= on > your machine. > > Comments welcome. This all seems fine and I'm interested to see the proposed patch. Even t= he smallest embedded machines that FreeBSD runs on without modification (= i.e. just install/run) have plenty of memory at this point. If someone r= eally wants to create a very small, FreeBSD based, web server then they'l= l care but they can probably come up with another way to handle their mem= ory needs. Best, George