From owner-freebsd-stable@FreeBSD.ORG Thu Nov 11 16:10:58 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7740F1065672; Thu, 11 Nov 2010 16:10:58 +0000 (UTC) (envelope-from oberman@es.net) Received: from mailgw.es.net (mail1.es.net [IPv6:2001:400:201:1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 61BF98FC15; Thu, 11 Nov 2010 16:10:58 +0000 (UTC) Received: from ptavv.es.net (ptavv.es.net [IPv6:2001:400:910::29]) by mailgw.es.net (8.14.3/8.14.3) with ESMTP id oABGAvV6018721 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 11 Nov 2010 08:10:57 -0800 Received: from ptavv.es.net (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id CA5A71CC0E; Thu, 11 Nov 2010 08:10:57 -0800 (PST) To: Kirill Yelizarov In-reply-to: Your message of "Wed, 10 Nov 2010 23:49:56 PST." <816869.17580.qm@web120510.mail.ne1.yahoo.com> Date: Thu, 11 Nov 2010 08:10:57 -0800 From: "Kevin Oberman" Message-Id: <20101111161057.CA5A71CC0E@ptavv.es.net> Cc: freebsd-stable@freebsd.org, net@freebsd.org Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2010 16:10:58 -0000 > Date: Wed, 10 Nov 2010 23:49:56 -0800 (PST) > From: Kirill Yelizarov > > > > --- On Thu, 11/11/10, Kevin Oberman wrote: > > > From: Kevin Oberman > > Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED] > > To: "Wilkinson, Alex" > > Cc: freebsd-stable@freebsd.org > > Date: Thursday, November 11, 2010, 8:26 AM > > > Date: Thu, 11 Nov 2010 13:01:26 > > +0800 > > > From: "Wilkinson, Alex" > > > Sender: owner-freebsd-stable@freebsd.org > > > > > > > > >     0n Wed, Nov 10, 2010 at > > 04:21:12AM -0800, Kirill Yelizarov wrote: > > > > > >     >All my em cards running > > 8.1 stable don't reply to icmp echo requests packets larger > > than 1472 bytes. > > >     > > > >     >On stable 7.2 the same > > hardware works as expected: > > >     ># ping -s 1500 > > 192.168.64.99 > > >     >PING 192.168.64.99 > > (192.168.64.99): 1500 data bytes > > >     >1508 bytes from > > 192.168.64.99: icmp_seq=0 ttl=63 time=1.249 ms > > >     >1508 bytes from > > 192.168.64.99: icmp_seq=1 ttl=63 time=1.158 ms > > >     > > > >     >Here is the dump on em > > interface > > >     >15:06:31.452043 IP > > 192.168.66.65 > *****: ICMP echo request, id 28729, seq > > 5, length 1480 > > >     >15:06:31.452047 IP > > 192.168.66.65 > ****: icmp > > >     >15:06:31.452069 IP **** > > > 192.168.66.65: ICMP echo reply, id 28729, seq 5, length > > 1480 > > >     >15:06:31.452071 IP *** > > > 192.168.66.65: icmp > > >     > > > >     >Same ping from same source > > (it's a 8.1 stable with fxp interface) to em card running > > 8.1 stable > > >     >#pciconf -lv > > >  > >    >em0@pci0:3:4:0:    > > class=0x020000 card=0x10798086 chip=0x10798086 rev=0x03 > > hdr=0x00 > > >     >    vendor  > >    = 'Intel Corporation' > > >     >    device  > >    = 'Dual Port Gigabit Ethernet Controller > > (82546EB)' > > >     >    class  > >     = network > > >     >    > > subclass   = ethernet > > >     > > > >     ># ping -s 1472 > > 192.168.64.200 > > >     >PING 192.168.64.200 > > (192.168.64.200): 1472 data bytes > > >     >1480 bytes from > > 192.168.64.200: icmp_seq=0 ttl=63 time=0.848 ms > > >     >^C > > >     > > > >     ># ping -s 1473 > > 192.168.64.200 > > >     >PING 192.168.64.200 > > (192.168.64.200): 1473 data bytes > > >     >^C > > >     >--- 192.168.64.200 ping > > statistics --- > > >     >4 packets transmitted, 0 > > packets received, 100.0% packet loss > > > > > > works fine for me: > > > > > > FreeBSD 8.1-STABLE #0 r213395 > > > > > > em0@pci0:0:25:0:class=0x020000 card=0x3035103c > > chip=0x10de8086 rev=0x02 hdr=0x00 > > >     vendor  > >    = 'Intel Corporation' > > >     device  > >    = 'Intel Gigabit network connection > > (82567LM-3 )' > > >     class      = > > network > > >     subclass   = > > ethernet > > > > > > #ping -s 1473 host > > > PING host(192.168.1.1): 1473 data bytes > > > 1481 bytes from 192.168.1.1: icmp_seq=0 ttl=253 > > time=31.506 ms > > > 1481 bytes from 192.168.1.1: icmp_seq=1 ttl=253 > > time=31.493 ms > > > 1481 bytes from 192.168.1.1: icmp_seq=2 ttl=253 > > time=31.550 ms > > > ^C > > > > The reason the '-s 1500' worked was that the packets were > > fragmented. If > > I add the '-D' option, '-s 1473' fails on v7 and v8. Are > > the V8 systems > > where you see if failing without the '-D' on the same > > network segment? > > If not, it is likely that an intervening device is refusing > > to fragment > > the packet. (Some routers deliberately don't fragment ICMP > > Echos Request > > packets.) > > If i set -D -s 1473 sender side refuses to ping and that is > correct. All mentioned above machines are behind the same router and > switch. Same hardware running v7 is working while v8 is not. And i > never saw such problems before. Also correct me if i'm wrong but the > dump shows that the packet arrived. I'll try driver from head and will > post here results. I did a bit more looking at this today and I see that something bogus is going on and it MAY be the em driver. I tried 1473 data byte pings without the DF flag. I then captured the packets on both ends (where the sending system has a bge (Broadcom GE) and the responding end has an em (Intel) card. What I saw was the fragmented IP packets all being received by the system with the em interface and an ICMP Echo Reply being sent back, again fragmented. I saw the reply on both ends, so both interfaces were able to fragment an over-sized packet, transmit the two pieces, and receive the two pieces. The em device could re-assemble them properly, but the bge device does not seem to re-assemble them correctly or else has a problem with ICMP packets bigger then MTU size. When I send from the em system, I see the packets and fragments all arrive in good form, but the system never sends out a reply. Since this is a kernel function, it may be a driver, but I suspect that it is in the IP stack since I am seeing the problem with a Broadcom card and I see the data all arriving. I think Jack can probably relax, but some patch to the network stack seems to have broken at least ICMP processing. And, since the bge system ups updated to 8-Stable on October 20 while the em system was updated back on August 9, I suspect the flaw was not driver related and was committed between August 9 and Oct. 20. I think this needs to go to the network list where the folks who tinker with that part of the kernel tend to hang out. Sorry for the cross-post. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751