From nobody Fri May 5 13:35:03 2023 X-Original-To: freebsd-transport@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCWsW0z6Yz49LYv for ; Fri, 5 May 2023 13:35:43 +0000 (UTC) (envelope-from chenshuo@chenshuo.com) Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCWsT2N04z3Q3T for ; Fri, 5 May 2023 13:35:41 +0000 (UTC) (envelope-from chenshuo@chenshuo.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-51452556acdso1135334a12.2 for ; Fri, 05 May 2023 06:35:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chenshuo-com.20221208.gappssmtp.com; s=20221208; t=1683293739; x=1685885739; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Mz5z/VosiOyVAYqGwlpmp6+mVNmWPZpgcsaYrPOi8HA=; b=4gFVzaUEGSxZNhsU4vWsJM0PmqixaBfmfddTQbq9I1g/VOGloiWKFD3awNWInHj9ri OLAhz/23f9eRugGrZL4tXT9/WzMt6C2BwFi3tiJ7zmV3lvWlLATxzzASSByit44c9wc1 1XUtkE8Q5QWTIt+fW3k3cMCp9V5ZfDgSsl6QyOd0/b4yV6/eyg86uUrpBphjW0iz4eUA wxWpDGw7/eungRjXQmGkd7xeHMLNXvijcShQ7Y94mTvgnHJaHq+93YOysn66TLB573kJ i85DAcO8ApohCqfhGIjhxBOx0l1xixLUi+3bx2/kKPFXBp1kq4Fk7T1KeOUTaUNRb4MI ZP7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683293739; x=1685885739; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mz5z/VosiOyVAYqGwlpmp6+mVNmWPZpgcsaYrPOi8HA=; b=YbdmNZPai5zi7RQupLK/AmY+ctQkbxz9pDJWo0TRnREgsGP33CwvnzweusBSUzJVtV zxeJGLT4/i7PJcssmEIKwh6k5No+ITj+ozcZTUP7d5VHB50FJxzJTIMTfbQXaNf8d9ub ax0XS4frAHeoFqFK4heC1eo/fwJeoQUostZ18sArHaiu0+5iMX4FFa8U1/qI+jVhs/xs iAgUn6XJlqdIXWJQ29cYg5rR/ljena7Rb1tJX0ts8g4fOgtfWRAf09D65QXcJDHcOshW KECCqxcMCLKbLJeLF05XAfZUheSlNCNe1woViwlH5IXv9TMxM0JpBS/RuGmUWQqJG/8F AScQ== X-Gm-Message-State: AC+VfDx2kWacB5TKiklQWKK5N8kz5k6rzDHx2rO5beQCrFBkCMecpHWw aqv7ki9109MvNef8v9cZzLfDvIw/Y12XDSg6wNmmWQ== X-Google-Smtp-Source: ACHHUZ63wp52cZSZ+QSwhRxGQ/GCvzbVo3KE/mBRsOGvOcarVE7VArOJlPj2f0c7M/JTz/kFWkgE/rHC7tnXutuY02U= X-Received: by 2002:a17:902:d2c7:b0:1a9:90bc:c3c6 with SMTP id n7-20020a170902d2c700b001a990bcc3c6mr1749518plc.16.1683293739301; Fri, 05 May 2023 06:35:39 -0700 (PDT) List-Id: Discussions List-Archive: https://lists.freebsd.org/archives/freebsd-transport List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-transport@freebsd.org X-BeenThere: freebsd-transport@freebsd.org MIME-Version: 1.0 References: <202305021355.342DtKWj021076@gndrsh.dnsmgr.net> <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com> In-Reply-To: <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com> From: Chen Shuo Date: Fri, 5 May 2023 06:35:03 -0700 Message-ID: Subject: Re: Cwnd grows slowly during slow-start due to LRO of the receiver side. To: Randall Stewart , "Rodney W. Grimes" Cc: freebsd-net , freebsd-transport@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4QCWsT2N04z3Q3T X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N Hi Rodney, Thanks for bringing this to the correct mailing list. Hi Randall, Thanks for your information, I didn't know that middle boxes can do such th= ings. Linux effectively sets abc_l_var to +inf, and opens cwnd quicker for aggregated ACKs. Its receiver also enters "quickack" mode after establishing a link to "accelerate slow-start". So its cwnd grows much more aggressively. My puzzle has been solved. Regards, Shuo On Thu, May 4, 2023 at 11:47=E2=80=AFAM Randall Stewart w= rote: > > Rodney/Chen > > This is a real issue in the internet=E2=80=A6 and its not just LRO/TSO ma= king this > all happen. You have cable modem technology that will batch up and keep t= he > most recent ack and thus aggregate some number of acks (I have seen up to > 10 acks eaten this way.. each of those for 2 segments).. > > You have other middle boxes as well doing similar things and then there i= s the > channel access technology that at least gives you all the acks only issue= is > they store them up and release them all at once so forget getting a nice > ack-clocking coming out of the stack. > > The only way to deal with it is to generally raise abc_l_var to a much la= rger > value. That way has you get an aggregated ack your cwnd will open.. down = side > is this lets you be more bursty=E2=80=A6 pacing can help here but only th= e bbr and rack > pace in FreeBSD=E2=80=A6 > > R > > On May 2, 2023, at 9:55 AM, Rodney W. Grimes wrote: > > Second attempt, first one failed due to not being a member > of the list :-(. > > Adding freebsd-transport@freebsd.org to get that specific groups > eyes on this issue. > > Rod > > As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, > FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension > That is, during slow-start, when receiving an ACK of 'bytes_acked' > > cwnd +=3D min(bytes_acked, abc_l_var * SMSS); // abc_l_var =3D 2 dflt > > As discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly balances > the negative impact of the delayed ACK algorithm. RFC 5681 also > requires that a receiver SHOULD generate an ACK for at least every > second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. > If both sender and receiver follow it. cwnd should grow exponentially > during slow-slow: > > cwnd *=3D 2 (per RTT) > > However, LRO and TSO are widely used today, so receiver may generate > much less ACKs than it used to do. As I observed, Both FreeBSD and > Linux generates at most one ACK per segment assembled by LRO/GRO. > The worst case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < 65535. > > Sending 1MB over a link of 100ms delay from FreeBSD 13.2: > > 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options > [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 > 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win > 65160, options [mss 1460,sackOK,TS val 563185696 ecr > 495212525,nop,wscale 7], length 0 > 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS > val 495212626 ecr 563185696], length 0 > // TSopt omitted below for brevity. > > // cwnd =3D 10 * MSS, sent 10 * MSS > 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 144= 80 > > // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * MSS > 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 > 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length= 17376 > > // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS > 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 > 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length= 20272 > > // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS > 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 > 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, > length 21500 > 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length= 1448 > > As a consequence, instead of growing exponentially, cwnd grows > more-or-less quadratically during slow-start, unless abc_l_var is > set to a sufficiently large value. > > NewReno took more than 20 seconds to ramp up throughput to 100Mbps > over an emulated 100ms delay link. While Linux took ~2 seconds. > I can provide the pcap file if anyone is interested. > > Switching to CUBIC won't help, because it uses the logic in NewReno > ack_received() for slow start. > > Is this a well-known issue and abc_l_var is the only cure for it? > https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network_tuning= .html&source=3Dgmail-imap&ust=3D1683640529000000&usg=3DAOvVaw0MoyDmFAOg9MlB= 5yX3FzJP > > Thank you! > > Best, > Shuo Chen > > > > -- > Rod Grimes rgrimes@freebs= d.org > > > > -- > Rod Grimes rgrimes@freebs= d.org > > > ------ > Randall Stewart > rrs@netflix.com > > > From nobody Fri May 12 14:45:02 2023 X-Original-To: transport@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QHs4M2FJ1z4B65J; Fri, 12 May 2023 14:45:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QHs4M1kQTz44YH; Fri, 12 May 2023 14:45:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1683902707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=HNVas6t9foLFN7JpTL3CXWvxIzwzshTxfKRMwbozkOQ=; b=V6tF2+TJK1cCXwF/gfEnCHX49by7DXlfsG4aYPMO9UkU8LKL/NZohk3MkKU9CZV57DM6wD bW2dhl/6NgIbobBENBGrGNfGfGdIGAgZtPrBQ5SRUq1Z+n1QZanQ8a49LxMNZ/Rg8o82US fahUOstIhAnhbA9VRFsUiIV+AbxUtDj+sSGqPr5D+st3kDq0SbXxarXcNO67bqGBDhpTFj 4LR0gjGTo4ey4W9hXxvzvzCfTJpZ3r74IHvS8p8VoB5vNYEfHSvVUGyyo5ZcaE4gqUmaCH s+YE6buzFW2nWwE1eO0lQP8Wm/cEL/PNA+7so6exar6oQr/8vbjbYyO0IPO8vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1683902707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=HNVas6t9foLFN7JpTL3CXWvxIzwzshTxfKRMwbozkOQ=; b=ssns3W4kSCqlFFQzErn3Y3srRFf6xdk/it+GBvdCvpxGSjI8KBCH4uZMjDOtdVM//004JV 7rCy7gY58dEjQeTEi+YBgAnjULKLgwOu5mQ21q4SsKrgaALTiHxmiYsKdcbQKE1iKivyq0 V8Fri8NpnZmvhAtCVKzLwkjcVew+ukZACQIpjTRVI07Czmzexv0PiJ1sn7zahiBxFfhAKN 0y1cENNajF/8U+OVDBrElfqrBuyjXIIedplNMObew0z82guKtSv79DAlX6+t42bQEh1gqS K5//SDGgPMuqG8j6iy9RGbaASV1pwGlUVmJJxiTQVVzjF+trukpHPTlONPf8Pw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1683902707; a=rsa-sha256; cv=none; b=B5BLf2tazjpu4I+sA85i8x0/vmF3t1C4QzyxBq9kJiNNcaceYJ9W6bnBcsugK2tOVHprMX pRhCfdewqM3NdbVHkK7kjT73p1eRgRR3CCaxWsIJgJyjEvWaSBgYpXWMKAImydIOnxeQCV h62il2jV759f6VB/3I52RunQXo3OOIGY4hw9bAzS1Q6GrVn2brStN9ZRPfCianCCPBgOPJ th2F6XKvNYBs9cQgCdU/p0yB5ZteMwTFtYi3tAX6wAbwL1Og1BWgLeImq5ChJiYOlCp/et VF6UVihM8wKAKFqOQDUMDmVgjdahZsHvwID8S5h5tUAzbngD6lKiqE7JtZPy6g== Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:13b:39f::9f:25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE Root Certificate Authority" (not verified)) (Authenticated sender: bz/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4QHs4M02ZrznRb; Fri, 12 May 2023 14:45:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id ABF568D4A171; Fri, 12 May 2023 14:45:05 +0000 (UTC) Received: from content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id F12F05C3A831; Fri, 12 May 2023 14:45:04 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) by content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (amavisd-new, port 10024) with ESMTP id J40qvCSmtQp3; Fri, 12 May 2023 14:45:03 +0000 (UTC) Received: from strong-iwl0.sbone.de (strong-iwl0.sbone.de [IPv6:fde9:577b:c1a9:4902:b66b:fcff:fef3:e3d2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id AF1DD5C3A82F; Fri, 12 May 2023 14:45:03 +0000 (UTC) Date: Fri, 12 May 2023 14:45:02 +0000 (UTC) From: "Bjoern A. Zeeb" To: net@freebsd.org, transport@freebsd.org Subject: LoR tcphash -> in6_ifaddr_lock Message-ID: <5734ss36-02or-p6o1-6qq6-5s6sno276331@SerrOFQ.bet> X-OpenPGP-Key-Id: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 List-Id: Discussions List-Archive: https://lists.freebsd.org/archives/freebsd-transport List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-transport@freebsd.org X-BeenThere: freebsd-transport@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-ThisMailContainsUnwantedMimeParts: N where does this one come from these days? lock order reversal: 1st 0xfffffe0002305a10 tcphash (tcphash, sleep mutex) @ /usr/src/sys/netinet/tcp_usrreq.c:1477 2nd 0xffffffff81a5b9d0 in6_ifaddr_lock (in6_ifaddr_lock, rm) @ /usr/src/sys/netinet6/in6_src.c:305 lock order tcphash -> in6_ifaddr_lock attempted at: #0 0xffffffff80bc92c3 at witness_checkorder+0xbb3 #1 0xffffffff80b503df at _rm_rlock_debug+0x12f #2 0xffffffff80d84c2f at in6_selectsrc+0x44f #3 0xffffffff80d84790 at in6_selectsrc_socket+0x40 #4 0xffffffff80d826f7 at in6_pcbconnect+0x247 #5 0xffffffff80d66813 at tcp6_connect+0xa3 #6 0xffffffff80d641f4 at tcp6_usr_connect+0x304 #7 0xffffffff80c057ff at soconnectat+0xaf #8 0xffffffff80c0c8f1 at kern_connectat+0xe1 #9 0xffffffff80c0c7e5 at sys_connect+0x75 #10 0xffffffff81051760 at amd64_syscall+0x140 #11 0xffffffff81023e1b at fast_syscall_common+0xf8 -- Bjoern A. Zeeb r15:7