Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jan 2022 17:01:51 -0500
From:      George Neville-Neil <gnn@neville-neil.com>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        net@freebsd.org
Subject:   Re: compressed TIME-WAIT to be decomissioned
Message-ID:  <C3A1E39F-B8A9-43D9-8813-A96227712B6F@neville-neil.com>
In-Reply-To: <Yd8im/VkTU1zdvOi@FreeBSD.org>
References:  <Yd8im/VkTU1zdvOi@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Removed current@ given your comment below.

On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote:

>   Hi!
>
> [crossposted to current@, but let's keep discussion at net@]
>
> I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and=

> Igor Sysoev (author of nginx).  Now posting for wider discussion.
>
> TLDR: struct tcptw shall be decomissioned
>
> Longer version covers three topics: why does tcptw exist? why is it no
> longer necessary? what would we get removing it?
>
> Why does struct tcptw exist?
>
> When TCP connection goes to TIME-WAIT state, it can only retransmit
> the very last ACK, thus doesn't need all of the control data in the ker=
nel.
> However, we are required to keep it in memory for certain amount of tim=
e
> (2*MSL). So, let's save memory: free the socket, free the tcpcb and
> leave only inpcb that will point at small tcptw (much smaller than tcpc=
b)
> that holds enough info to retransmit the last ACK. This was done in
> early 2003, see 340c35de6a2.
>
> What was different in 2003 compared to 2022?
>
> * First of all, internet servers were running i386 with only 2 Gb of KV=
A
>   space. Unlike today, they were memory constrained in the first place,=
 not
>   CPU bound like they are today.
>
> * Many of HTTP connections were made by older browsers, which were not =
able
>   to use persistent HTTP connections.  Those browsers that could, would=

>   recycle connections more often, then today.  Default timeouts in Apac=
he
>   for persistent connections were short.  So, the ratio of connections
>   in TIME-WAIT compared to live connections was much bigger than today.=

>   Here is sample data from 2008 provided to me by Igor Sysoev:
>
>   ITEM         SIZE     LIMIT      USED      FREE  REQUESTS  FAILURES
>   tcpcb:        728,   163840,    22938,    72722, 13029632,        0
>   tcptw:         88,   163842,    10253,    72949,  2447928,        0
>
>   We see that TIME-WAITs are ~ 50% of live connections.
>
>   Today I see that TIME-WAITs are ~ 1% of connections. My data is biase=
d
>   here, since I'm looking at servers that do mostly video streaming. I'=
d
>   be grateful if anybody replies to this email with some other modern d=
ata
>   on ratio between tcpcb and tcptw allocations.
>
> * The Internet bandwidth was lower and thus average size of HTTP object=

>   much smaller.  That made the average send socket buffer size much sma=
ller
>   than today.  Note that TCP socket buffers autosizing came in 2009 onl=
y.
>   This means that today most significant portion of kernel memory consu=
med
>   by an average TCP connection is the send socket buffer, and
>   socket+inpcb+tcpcb is just a fraction of that.  Thus, swapping tcpcb =
to
>   tcptw we are saving a fraction of a fraction of memory consumed by av=
erage
>   connection.
>
> * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT?
>   In 71d2d5adfe1 I added some stats on usage of tcptw and experimented =
a bit
>   with lowering net.inet.tcp.msl. It appeared that lowering it down thr=
ee
>   times doesn't have statistically significant effect on TIME-WAIT use =
stats.
>   This means that the already miniscule number of TIME-WAIT connection =
on a
>   modern HTTP server can be lowered 3 times more.  Feel free to lower
>   net.inet.tcp.msl and do your own measurements with
>   'netstat -sp tcp | grep TIME-WAIT'.  I'd be glad to see your results.=


The origin of the 2*MSL is pretty old and from a different type of networ=
k, but, my understanding of your proposal is not a change to this value a=
nyway, is that correct?  The removal of tcptw is a separate issue, if I u=
nderstand you correctly.

> Ok, now what would removal give us?
>
> * One less alloc/free during socket lifetime (immediately).
> * Reduced code complexity. inp->inp_ppcb always can be dereferenced as =
tcpcb.
>   Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventu=
ally).
> * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS=
=2E
>   Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven conn=
ection
>   may transition to TIME-WAIT, so we can't use tcpcb. Now we would be a=
ble to.
>   So, for non TCP connections memory footprint shrinks (with following =
changes).
> * Embedding inpcb into protocols cb. An inpcb becomes one piece of memo=
ry with
>   tcpcb. One more less alloc/free during socket lifetime. Reduced code
>   complexity, since now inpcb =3D=3D tcpb (following changes).
>
> How much memory are we going to lose?
>
> (kgdb) p tcpcb_zone->uz_keg->uk_rsize
> $5 =3D 1064
> (kgdb) p tcptw_zone->uz_keg->uk_rsize
> $6 =3D 72
> (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize
> $8 =3D 424
>
> After change a connection in TIME-WAIT would consume 424+1064 bytes ins=
tead
> of 424+72. Multiply that by expected number of connections in TIME-WAIT=
 on
> your machine.
>
> Comments welcome.

This all seems fine and I'm interested to see the proposed patch.  Even t=
he smallest embedded machines that FreeBSD runs on without modification (=
i.e. just install/run) have plenty of memory at this point.  If someone r=
eally wants to create a very small, FreeBSD based, web server then they'l=
l care but they can probably come up with another way to handle their mem=
ory needs.

Best,
George



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3A1E39F-B8A9-43D9-8813-A96227712B6F>