From owner-freebsd-net Tue Jun 5 5:21:11 2001 Delivered-To: freebsd-net@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id DAF0137B403; Tue, 5 Jun 2001 05:20:52 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f55CKSR14406; Tue, 5 Jun 2001 15:20:28 +0300 (EEST) (envelope-from ru) Date: Tue, 5 Jun 2001 15:20:28 +0300 From: Ruslan Ermilov To: Jesper Skriver , Jonathan Lemon Cc: freebsd-net@FreeBSD.org Subject: Re: control TCP send/recieve window size based on port numbers ? and a bug(?) in sendpipe/recvpipe handling ... Message-ID: <20010605152028.A12215@sunbay.com> Mail-Followup-To: Jesper Skriver , Jonathan Lemon , freebsd-net@FreeBSD.ORG References: <20010526213442.A95985@skriver.dk> <20010527000854.B98021@skriver.dk> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="HcAYCG3uE/tztfnV" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010527000854.B98021@skriver.dk>; from jesper@skriver.dk on Sun, May 27, 2001 at 12:08:54AM +0200 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --HcAYCG3uE/tztfnV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, May 27, 2001 at 12:08:54AM +0200, Jesper Skriver wrote: > On Sat, May 26, 2001 at 09:34:42PM +0200, Jesper Skriver wrote: > > Hi, > > > > I'm currently looking at ways to tune a ftp server, and when > > tuning net.inet.tcp.sendspace/net.inet.tcp.recvspace and > > NMBCLUSTERS, I came to think that in a ftp server role, half the > > TCP sessions will be control sessions, which doesn't transfer much > > data, so there is no reason to reserve the same number of buffers > > for sendspace/recvspace for these, compared to the data sessions. > > > > I was thinking of adding 3 new sysctl's > > > > net.inet.tcp.override_sendspace > > net.inet.tcp.override_recvspace > > net.inet.tcp.override_ports > > > > The latter controls which (if any) src/dst ports, trigger the > > session to get the overridden send and recv-space applied. > > > > Does this make any sense ? > > As Mike Silbersack has educated me, the sendspace and recvspace is > only the upper limit pr. session, and it's not static allocated, > so this i not a problem, and thus this patch doesn't give us > anything. > > So the only thing remaining is the bug where the sendpipe/recvpipe > doesn't have any effect. > It does, but only if the pipesize from the rtentry is greater than the mss. IOW, buffer sizes never fall below MSS. I wonder if this was intentional though. The code for rmx_recvpipe suggests it was. : /* : * If there's a pipesize, change the socket buffer : * to that size. Make the socket buffers an integral : * number of mss units; if the mss is larger than : * the socket buffer, decrease the mss. : */ : #ifdef RTV_SPIPE : if ((bufsize = rt->rt_rmx.rmx_sendpipe) == 0) : #endif : bufsize = so->so_snd.sb_hiwat; : if (bufsize < mss) : mss = bufsize; : else { : bufsize = roundup(bufsize, mss); : if (bufsize > sb_max) : bufsize = sb_max; : (void)sbreserve(&so->so_snd, bufsize, so, NULL); : } : tp->t_maxseg = mss; : : #ifdef RTV_RPIPE : if ((bufsize = rt->rt_rmx.rmx_recvpipe) == 0) : #endif : bufsize = so->so_rcv.sb_hiwat; : if (bufsize > mss) { : bufsize = roundup(bufsize, mss); : if (bufsize > sb_max) : bufsize = sb_max; : (void)sbreserve(&so->so_rcv, bufsize, so, NULL); : } Also, there is the related PR kern/11966 which complains about this code overriding user-set buffer sizes. The problem could be demonstrated with the loopback connection (through lo0), for which lortrequest() always sets send and receive pipes to 3 * LOMTU = 49152: : # sysctl net.inet.tcp.recvspace net.inet.tcp.rfc1323 : net.inet.tcp.recvspace: 65535 : net.inet.tcp.rfc1323: 1 : : # route -n get 127.1 : route to: 127.0.0.1 : destination: 127.0.0.1 : interface: lo0 : flags: : recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire : 49152 49152 0 0 0 0 16384 0 : : # ./tcp : rcv. buffer size before connect(): 65535 bytes : rcv. buffer size after connect(): 57344 bytes where: mss = rounddown(mtu - 40, MCLBYTES) = rounddown(16384 - 40, 2048) = 14336 rcvbuf = roundup(recvpipe, mss) = roundup(49152, 14336) = 57344 In the rfc1323=1 case, this is even worse. The user initially sets the large receive buffer, this then gets announced via the window scale option, and this code then resets the receive buffer to the lower size. The attached patch fixes this by only changing the buffer size to the greater value. The impact of this patch should be low, as (by default) only routes through the loopback interface have these routing metrics set. Please review. Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age --HcAYCG3uE/tztfnV Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p Index: tcp_input.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_input.c,v retrieving revision 1.107.2.8 diff -u -p -r1.107.2.8 tcp_input.c --- tcp_input.c 2001/04/18 17:55:23 1.107.2.8 +++ tcp_input.c 2001/06/05 11:55:03 @@ -2786,7 +2786,8 @@ tcp_mss(tp, offer) bufsize = roundup(bufsize, mss); if (bufsize > sb_max) bufsize = sb_max; - (void)sbreserve(&so->so_snd, bufsize, so, NULL); + if (bufsize > so->so_snd.sb_hiwat) + (void)sbreserve(&so->so_snd, bufsize, so, NULL); } tp->t_maxseg = mss; @@ -2798,7 +2799,8 @@ tcp_mss(tp, offer) bufsize = roundup(bufsize, mss); if (bufsize > sb_max) bufsize = sb_max; - (void)sbreserve(&so->so_rcv, bufsize, so, NULL); + if (bufsize > so->so_rcv.sb_hiwat) + (void)sbreserve(&so->so_rcv, bufsize, so, NULL); } /* --HcAYCG3uE/tztfnV-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message