From owner-freebsd-net@freebsd.org Sat Aug 19 16:56:31 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C96FDD42EB for ; Sat, 19 Aug 2017 16:56:31 +0000 (UTC) (envelope-from mike@karels.net) Received: from mail.karels.net (mail.karels.net [63.231.190.5]) by mx1.freebsd.org (Postfix) with ESMTP id E5314811A8; Sat, 19 Aug 2017 16:56:29 +0000 (UTC) (envelope-from mike@karels.net) Received: from [10.0.2.11] (mjk-mac2.karels.net [10.0.2.11]) by mail.karels.net (8.15.2/8.15.2) with ESMTP id v7JGuM0W091894; Sat, 19 Aug 2017 11:56:22 -0500 (CDT) (envelope-from mike@karels.net) From: "Mike Karels" To: "Julian Elischer" Cc: "Gopakumar Pillai" , "Bjoern A. Zeeb" , "freebsd-net@FreeBSD.org" Subject: Re: Only last IP frag sent if ARP entry absent Date: Sat, 19 Aug 2017 11:56:12 -0500 Message-ID: In-Reply-To: References: <43CC3432-DB42-4170-B3E7-E305561973F3@lists.zabbadoz.net> <9B1B1A12-CD9F-4A9F-B596-A2F6E5BAED1E@karels.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed X-Mailer: MailMate (1.9.6r5347) Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mail.karels.net id v7JGuM0W091894 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Aug 2017 16:56:31 -0000 On 19 Aug 2017, at 4:00, Julian Elischer wrote: > On 18/8/17 11:33 am, Mike Karels wrote: >> Another $.02 (inline): >> >> On 17 Aug 2017, at 18:39, Gopakumar Pillai wrote: >> >>> Thank You Bjoern. Could you please point me to the RFC? >> >> I don=E2=80=99t know if there is anything more recent than RFC1122 on = this.=20 >> IIRC, it requires queuing at least one packet. Queing one packet is=20 >> what BSD has done essentially since ARP was implemented. > > This asks the question: One physical packet or one logical packet? > Gopakumar's change effectively changes the queuing from one physical=20 > packet to the logical one. > The next question becomes "how much extra work do we do to achieve=20 > this and does it affect anything else"? That isn=E2=80=99t the whole question. It=E2=80=99s one physical packet,= one=20 logical packet, or multiple frames? It makes more sense to me to support multiple frames rather than just=20 one logical packet. However, I don=E2=80=99t see a good reason to change from the current code. >>> If this is not a MUST behavior in RFC, would my fix be good? I agree=20 >>> that this would affect only ICMP/UDP traffic. >> >> People have been asking for queuing of multiple packets for years. =20 >> That is a more general change. Consider another dumb application=20 >> that starts out by sending multiple UDP packets back-to-back. =20 >> However, well-designed application protocols don=E2=80=99t experience=20 >> problems like this. I=E2=80=99ll quickly note that ping isn=E2=80=99t= an=20 >> application, but a network measuring tool. If you ask the question=20 >> =E2=80=9Cwhat happens if I start off a session with a single large pac= ket=20 >> and I don=E2=80=99t support retransmission=E2=80=9D, ping answers that= question=20 >> correctly. >> >> If badly-designed protocols get bad performance, that doesn=E2=80=99t = seem=20 >> like a bug to me, but a feature. >> >>> On 8/17/17, 2:40 PM, "Bjoern A. Zeeb"=20 >>> wrote: >>> >>> On 17 Aug 2017, at 21:16, Gopakumar Pillai wrote: >>> >>> > Hi FreeBSD Networking Gurus, >>> > I came across an issue with an old version of FreeBSD and=20 >>> looking at >>> > the latest FreeBSD code, seems it exists even now. I am=20 >>> assuming that >>> > this issue is not reported. >>> > >>> > Observation: >>> > When a ping was performed with larger payload than MTU, the=20 >>> first ping >>> > failed when the ARP entry was absent for that IP. >>> >>> That is because ping/ICMP has no retransmit. >>> >>> >>> > Noticed on the wire that the last IP fragment was sent for the=20 >>> first >>> > request and then the subsequent requests were fine. >>> > >>> > Root Cause: >>> > * ip_output fragments the packets and loops through the=20 >>> fragments to >>> > send them to ether_output. >>> > * ether_output does an arpresolve and if there is no=20 >>> existing ARP >>> > entry it'll return EWOULDBLOCK after sending ARP Request. >>> > * ether_output ignores the error and propagates success to=20 >>> ip_output >>> > and it continues to send the remaining fragments. >>> > * llentry keeps only one mbuf and the last fragment is=20 >>> retained when >>> > the ARP Reply comes and the fragment is sent. >>> >>> Yes, according to the spec (RFC) we are supposed to throw the=20 >>> packet >>> away entirely and simply report that to the next upper layer. =20 >>> However >>> over the years people realised that this sucks for a TCP SYN=20 >>> packet with >>> a retransmit timer and hence we store one of them. >>> >>> A large UDP packet would btw see the same behaviour to your=20 >>> ping. >>> There=E2=80=99s no guarantee any of these packets will not be dro= pped=20 >>> anywhere >>> on the network, so we can as well. >>> >>> Just my 2ct >>> >>> /bz >> >> Mike >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to=20 >> "freebsd-net-unsubscribe@freebsd.org" >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"