From owner-freebsd-stable@FreeBSD.ORG Thu Nov 11 21:05:34 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 601DC1065672; Thu, 11 Nov 2010 21:05:34 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1F1698FC19; Thu, 11 Nov 2010 21:05:33 +0000 (UTC) Received: by pvc22 with SMTP id 22so535219pvc.13 for ; Thu, 11 Nov 2010 13:05:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=bJmqtQRPm38Y9qOJkCB8izzVHlpKxHbp9sa9dtsInYY=; b=qfi0XeKeaXMAgRgVyoXRX0iiJBlY4Z+IRXa1zfTd4Sa8NEYPu4WKg69iXHNd2IKOOy l1B2Tl06a07vkbUbg3vbgs5e9DmUDiPIfrtWA00DtfXTQVo4WNlrwvsrZ5XJwbB1ZvdD AwzXS+pSs9WysV2HNNqDvQrqm//JUJheFt9Aw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; b=GUr0CCYqG7nxk55uOCqu4Kc3vCPpMmcUlm/2/CkEMXJz4gjRZN7ZHxZnUCxN/KDBRv Zl7+8xpSNmqx9LunvF/v2PJoaoM7IJQHREnABkFZnaDPc9IQ6dfpmNrOvhJrLpGKaQon Wz4rFfF0QeBctwbJIgFDDKVF5ZoyvwoFD1TD0= Received: by 10.143.18.7 with SMTP id v7mr1188924wfi.254.1289509532674; Thu, 11 Nov 2010 13:05:32 -0800 (PST) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id q13sm2832996wfc.17.2010.11.11.13.05.29 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 11 Nov 2010 13:05:30 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Thu, 11 Nov 2010 13:04:36 -0800 From: Pyun YongHyeon Date: Thu, 11 Nov 2010 13:04:36 -0800 To: Kevin Oberman Message-ID: <20101111210436.GD17566@michelle.cdnetworks.com> References: <816869.17580.qm@web120510.mail.ne1.yahoo.com> <20101111161057.CA5A71CC0E@ptavv.es.net> Mime-Version: 1.0 Content-Type: text/plain; charset=EUC-KR Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20101111161057.CA5A71CC0E@ptavv.es.net> User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org, Kirill Yelizarov , net@freebsd.org Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2010 21:05:34 -0000 On Thu, Nov 11, 2010 at 08:10:57AM -0800, Kevin Oberman wrote: > > Date: Wed, 10 Nov 2010 23:49:56 -0800 (PST) > > From: Kirill Yelizarov > > > > > > > > --- On Thu, 11/11/10, Kevin Oberman wrote: > > > > > From: Kevin Oberman > > > Subject: Re: icmp packets on em larger than 1472 [SEC=UNCLASSIFIED] > > > To: "Wilkinson, Alex" > > > Cc: freebsd-stable@freebsd.org > > > Date: Thursday, November 11, 2010, 8:26 AM > > > > Date: Thu, 11 Nov 2010 13:01:26 > > > +0800 > > > > From: "Wilkinson, Alex" > > > > Sender: owner-freebsd-stable@freebsd.org > > > > > > > > > > > >? ???0n Wed, Nov 10, 2010 at > > > 04:21:12AM -0800, Kirill Yelizarov wrote: > > > > > > > >? ???>All my em cards running > > > 8.1 stable don't reply to icmp echo requests packets larger > > > than 1472 bytes. > > > >? ???> > > > >? ???>On stable 7.2 the same > > > hardware works as expected: > > > >? ???># ping -s 1500 > > > 192.168.64.99 > > > >? ???>PING 192.168.64.99 > > > (192.168.64.99): 1500 data bytes > > > >? ???>1508 bytes from > > > 192.168.64.99: icmp_seq=0 ttl=63 time=1.249 ms > > > >? ???>1508 bytes from > > > 192.168.64.99: icmp_seq=1 ttl=63 time=1.158 ms > > > >? ???> > > > >? ???>Here is the dump on em > > > interface > > > >? ???>15:06:31.452043 IP > > > 192.168.66.65 > *****: ICMP echo request, id 28729, seq > > > 5, length 1480 > > > >? ???>15:06:31.452047 IP > > > 192.168.66.65 > ****: icmp > > > >? ???>15:06:31.452069 IP **** > > > > 192.168.66.65: ICMP echo reply, id 28729, seq 5, length > > > 1480 > > > >? ???>15:06:31.452071 IP *** > > > > 192.168.66.65: icmp > > > >? ???> > > > >? ???>Same ping from same source > > > (it's a 8.1 stable with fxp interface) to em card running > > > 8.1 stable > > > >? ???>#pciconf -lv > > > >? > > > ???>em0@pci0:3:4:0:??? > > > class=0x020000 card=0x10798086 chip=0x10798086 rev=0x03 > > > hdr=0x00 > > > >? ???>? ? vendor? > > > ???= 'Intel Corporation' > > > >? ???>? ? device? > > > ???= 'Dual Port Gigabit Ethernet Controller > > > (82546EB)' > > > >? ???>? ? class? > > > ? ? = network > > > >? ???>? ? > > > subclass???= ethernet > > > >? ???> > > > >? ???># ping -s 1472 > > > 192.168.64.200 > > > >? ???>PING 192.168.64.200 > > > (192.168.64.200): 1472 data bytes > > > >? ???>1480 bytes from > > > 192.168.64.200: icmp_seq=0 ttl=63 time=0.848 ms > > > >? ???>^C > > > >? ???> > > > >? ???># ping -s 1473 > > > 192.168.64.200 > > > >? ???>PING 192.168.64.200 > > > (192.168.64.200): 1473 data bytes > > > >? ???>^C > > > >? ???>--- 192.168.64.200 ping > > > statistics --- > > > >? ???>4 packets transmitted, 0 > > > packets received, 100.0% packet loss > > > > > > > > works fine for me: > > > > > > > > FreeBSD 8.1-STABLE #0 r213395 > > > > > > > > em0@pci0:0:25:0:class=0x020000 card=0x3035103c > > > chip=0x10de8086 rev=0x02 hdr=0x00 > > > >? ???vendor? > > > ???= 'Intel Corporation' > > > >? ???device? > > > ???= 'Intel Gigabit network connection > > > (82567LM-3 )' > > > >? ???class? ? ? = > > > network > > > >? ???subclass???= > > > ethernet > > > > > > > > #ping -s 1473 host > > > > PING host(192.168.1.1): 1473 data bytes > > > > 1481 bytes from 192.168.1.1: icmp_seq=0 ttl=253 > > > time=31.506 ms > > > > 1481 bytes from 192.168.1.1: icmp_seq=1 ttl=253 > > > time=31.493 ms > > > > 1481 bytes from 192.168.1.1: icmp_seq=2 ttl=253 > > > time=31.550 ms > > > > ^C > > > > > > The reason the '-s 1500' worked was that the packets were > > > fragmented. If > > > I add the '-D' option, '-s 1473' fails on v7 and v8. Are > > > the V8 systems > > > where you see if failing without the '-D' on the same > > > network segment? > > > If not, it is likely that an intervening device is refusing > > > to fragment > > > the packet. (Some routers deliberately don't fragment ICMP > > > Echos Request > > > packets.) > > > > If i set -D -s 1473 sender side refuses to ping and that is > > correct. All mentioned above machines are behind the same router and > > switch. Same hardware running v7 is working while v8 is not. And i > > never saw such problems before. Also correct me if i'm wrong but the > > dump shows that the packet arrived. I'll try driver from head and will > > post here results. > > I did a bit more looking at this today and I see that something bogus is > going on and it MAY be the em driver. > > I tried 1473 data byte pings without the DF flag. I then captured the > packets on both ends (where the sending system has a bge (Broadcom GE) > and the responding end has an em (Intel) card. > > What I saw was the fragmented IP packets all being received by the > system with the em interface and an ICMP Echo Reply being sent back, > again fragmented. I saw the reply on both ends, so both interfaces were > able to fragment an over-sized packet, transmit the two pieces, and > receive the two pieces. The em device could re-assemble them properly, > but the bge device does not seem to re-assemble them correctly or else > has a problem with ICMP packets bigger then MTU size. > > When I send from the em system, I see the packets and fragments all > arrive in good form, but the system never sends out a reply. Since this > is a kernel function, it may be a driver, but I suspect that it is in > the IP stack since I am seeing the problem with a Broadcom card and I > see the data all arriving. > Most ethernet controllers including bge(4) have a function to specify how much RX buffer space would be allocated to receive a frame. When controller receive a frame that has larger size than the size specified in RX buffer space, it would drop the frame. Because the oversized frame was silently dropped in driver layer upper stack has no chance to reply back ICMP responses with fragmentation needed bit for frames that set don't fragment bit. This is where correct MTU configuration play an important role in driver layer. If you want to handle oversized frame you also have to set correct MTU of interface. However all controllers should be able to receive standard MTU sized frame including VLAN tag so no special configuration is needed when you handle standard MTU sized frames. Some old controllers can't handle VLAN oversized frame such that you would have no way to send or receive them. em(4) controllers have different receiving logic where it allows chaining multiple oversized frames into a single frame. So up to certain point, which depends on the size of jumbo frame controller supports, em(4) can receive these oversized frames regardless of MTU configuration with the help of driver. The chaining is done in driver layer and that would add additional overhead(chaining + multiple mbuf allocation) but it has its own advantages. I was not able to to reproduce the issue with em(4)/bge(4) on CURRENT and these drivers worked as expected. > I think Jack can probably relax, but some patch to the network stack > seems to have broken at least ICMP processing. And, since the bge system > ups updated to 8-Stable on October 20 while the em system was updated > back on August 9, I suspect the flaw was not driver related and was > committed between August 9 and Oct. 20. > > I think this needs to go to the network list where the folks who tinker > with that part of the kernel tend to hang out. Sorry for the cross-post. > -- > R. Kevin Oberman, Network Engineer > Energy Sciences Network (ESnet) > Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) > E-mail: oberman@es.net Phone: +1 510 486-8634 > Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"