From owner-freebsd-net@FreeBSD.ORG Mon May 12 11:22:08 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2E8A2893; Mon, 12 May 2014 11:22:08 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail-n.franken.de", Issuer "Thawte DV SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B4FD62A37; Mon, 12 May 2014 11:22:07 +0000 (UTC) Received: from [10.225.7.42] (unknown [194.95.73.101]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 8A48E1C104906; Mon, 12 May 2014 13:22:05 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: RX checksum offloading problem From: Michael Tuexen In-Reply-To: <20140512013612.GA4085@michelle.cdnetworks.com> Date: Mon, 12 May 2014 13:22:03 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <72BEDE7B-C1F1-4EE2-BA68-49AED7209E8A@lurchi.franken.de> References: <0EB8F4F6-65C2-4B90-8101-FCC53A15C6F9@lurchi.franken.de> <20140507075612.GA1376@michelle.cdnetworks.com> <36469814-FAC8-4172-A792-487E2AB8ECB9@lurchi.franken.de> <20140507083751.GB1376@michelle.cdnetworks.com> <415C1CB5-3AF9-44E4-943A-74116037980E@lurchi.franken.de> <20140509013556.GA3014@michelle.cdnetworks.com> <20140512013612.GA4085@michelle.cdnetworks.com> To: pyunyh@gmail.com X-Mailer: Apple Mail (2.1874) Cc: FreeBSD Net , "Bjoern A. Zeeb" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 May 2014 11:22:08 -0000 On 12 May 2014, at 03:36, Yonghyeon PYUN wrote: > On Fri, May 09, 2014 at 12:46:48PM +0200, Michael Tuexen wrote: >> On 09 May 2014, at 03:35, Yonghyeon PYUN wrote: >>=20 >>> On Thu, May 08, 2014 at 08:40:22PM +0200, Michael Tuexen wrote: >>>> On 07 May 2014, at 10:37, Yonghyeon PYUN wrote: >>>>=20 >>>>> On Wed, May 07, 2014 at 10:07:09AM +0200, Michael Tuexen wrote: >>>>>> On 07 May 2014, at 09:56, Yonghyeon PYUN = wrote: >>>>>>=20 >>>>>>> On Sat, May 03, 2014 at 11:52:47AM +0200, Michael Tuexen wrote: >>>>>>>> On 02 May 2014, at 16:02, Bjoern A. Zeeb = wrote: >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> On 02 May 2014, at 10:22 , Michael Tuexen = wrote: >>>>>>>>>=20 >>>>>>>>>> Dear all, >>>>>>>>>>=20 >>>>>>>>>> during testing I found that FreeBSD head (on a raspberry pi) = accepts SCTP packet >>>>>>>>>> with bad checksums. After debugging this I figured out that = this is a problem with >>>>>>>>>> the csum_flags defined in mbuf.h. >>>>>>>>>>=20 >>>>>>>>>> The SCTP code on its input path checks for CSUM_SCTP_VALID, = which is defined in mbuf.h: >>>>>>>>>> #define CSUM_SCTP_VALID CSUM_L4_VALID >>>>>>>>>> This makes sense: If CSUM_SCTP_VALID is set in csum_flags, = the packet is considered >>>>>>>>>> to have a correct checksum. >>>>>>>>>>=20 >>>>>>>>>> For UDP and TCP some drivers calculate the UDP/TCP checksum = and set CSUM_DATA_VALID in >>>>>>>>>> csum_flags to indicate that the UDP/TCP should consider = csum_data to figure out if >>>>>>>>>> the packet has a correct checksum. The problem is that = CSUM_DATA_VALID is defined as >>>>>>>>>> #define CSUM_DATA_VALID CSUM_L4_VALID >>>>>>>>>> In this case the semantic is not that the packet has a valid = checksum, but the csum_data >>>>>>>>>> field contains information. >>>>>>>>>>=20 >>>>>>>>>> Now the following happens (on the raspberry pi the driver = used is >>>>>>>>>> dev/usb/net/if_smsc.c >>>>>>>>>>=20 >>>>>>>>>> 1. A packet is received and if it is not too short, the = checksum computed >>>>>>>>>> is stored in csum_data and the flag CSUM_DATA_VALID is set. = This happens >>>>>>>>>> for all IP packets, not only for UDP and TCP packets. >>>>>>>>>> 2. In case of SCTP packets, the SCTP interprets = CSUM_DATA_VALID as CSUM_SCTP_VALID >>>>>>>>>> and accepts the packet. So no SCTP checksum check ever = happened. >>>>>>>>>>=20 >>>>>>>>>> Alternatives to fix this: >>>>>>>>>>=20 >>>>>>>>>> 1. Change all drivers to set CSUM_DATA_VALID only in case of = UDP or TCP packets, since >>>>>>>>>> it only makes sense in these cases. >>>>>>>>>=20 >>>>>>>>> Wait, or for SCTP in cad the crc32 (I think it was) was = actually checked but not otherwise. This is how it should be imho. It = seems like a driver bug. >>>>>>>> I went through the list of drivers and you are right, it seems = to be a bug >>>>>>>> in if_smsc.c. Most of the other drivers check for UDP/TCP, a = small set I can't tell. >>>>>>>>=20 >>>>>>>=20 >>>>>>> I'm not sure how the controller computes TCP/UDP checksum = values. >>>>>>> It seems the publicly available data sheet was highly sanitized = so >>>>>>> it was useless to me. The comment in the driver says that the >>>>>> Same for me... >>>>>>> controller computes RX checksum after the IPv4 header to the end = of >>>>>>> ethernet frame. After seeing that comment, three questions = popped >>>>>>> up: >>>>>>>=20 >>>> OK, I did some testing. It looks like the card is just computing = the >>>> checksum over the IP payload taking the correct IP header length = into account. >>>>>>> 1. Is the controller smart enough to skip IP options header in >>>>>>> TCP/UDP checksum offloading? >>>> Yes, I can send fragmented and un-fragmented UDP packets with IP = options >>>> and they are handled correctly. Even if the last fragment is too = short. >>>=20 >>> I'm assuming you're taking about receiving fragmented UDP packets >>> with RX checksum offloading, right? >> Correct. >>>=20 >>>>>>> 2. How controller handles UDP checksum value 0x0000(i.e. sender >>>>>>> didn't compute UDP checksum)? >>>> This case isn't handled. However, udp_input() looks first for zero = checksums >>>> and only after that in the csum_flags. So it doesn't result in any = problems. >>>> Would you prefer not to set CSUM_DATA_VALID in this case? >>>=20 >>> At least, it correctly updates UDP stats of netstat(1). >> Let me double check that... >>>=20 >>>>>>> 3. How the controller can compute TCP checksum of fragmented >>>>>>> packets? >>>> At least it does it right for UDP... >>>>=20 >>>=20 >>> Hmm, CSUM_DATA_VALID indicates driver computed RX TCP/UDP checksum >>> without pseudo header. As you know, controller can't compute >>> TCP/UDP checksum until all its fragmented payload are read from >>> wire. Packets may arrive out of order and may be mixed with other >> I'm not sure I understand this... Please note that the pseudo header >> is not taken into account. So the card can compute the checksum over >> the payload of IP for each fragment. This is stored in the csum_data >> field. During reassembly the csum_data fields of the fragments are >> combined in >> = http://svnweb.freebsd.org/base/head/sys/netinet/ip_input.c?view=3Dmarkup#l= 1075 >> This looks OK to me. I'm not sure why you think the cards needs >> to keep state for this. I understand it needs state if it wants >> to take the pseudoheader into account, but this is not done here. >=20 > Oops, sorry. You're right. Probably I was confused with old memory > when I worked on that area. I've quickly read IP reassembly code > again and as you said, it should work. However it seems there is a > checksumming bug here. >=20 > /* > * In order to do checksumming faster we do 'end-around carry' = here > * (and not in for{} loop), though it implies we are not going = to > * reassemble more than 64k fragments. > */ > m->m_pkthdr.csum_data =3D > (m->m_pkthdr.csum_data & 0xffff) + (m->m_pkthdr.csum_data >> = 16); >=20 > I guess the line above didn't account possible carry happened after > the computation. Probably it could be rewritten as the following. >=20 > while (m->m_pkthdr.csum_data & 0xffff0000) > m->m_pkthdr.csum_data =3D (m->m_pkthdr.csum_data & = 0xffff) + > (m->m_pkthdr.csum_data >> 16); I think you are right here. Good catch. Will you fix it? Best regards Michael >=20 >>> packets. Advanced controllers with enough memory may be able to >>> compute TCP/UDP checksums by tracking each connections(e.g LRO) but >>> low-end controllers may be not. It seems the controller does not >>> even support RX TCP/UDP pseudo header checksum offloading so I >>> wonder how this controller can support RX TCP/UDP checksum >>> offloading for fragmented TCP/UDP packets. >> I'm not sure what I'm missing here... You compute the sum over >> the parts and add the sums of the parts. That should work for >> the UDP/TCP checksum. >>>=20 >>> Some controllers maintain two bits for TCP/UDP checksum offloading >>> result in status word. One bit is used to indicate whether >>> controller performed TCP/UDP checksum offloading and the other bit >>> is used to indicate whether the computed checksum is correct or >> For SCTP we only get a bit that the checksum was computed and is = correct... >>> not. For UDP checksum value 0x0000 and fragmented TCP/UDP packets, >>> these controllers do not attempt to compute TCP/UDP checksum. >> I think it "just" computes it and leaves it up to the upper layer >> to use the result or not... >>=20 >=20 > Yes, I stand corrected. Thanks. :-) >=20 >> Best regards >> Michael >>>=20 >>>> Best regards >>>> Michael >>>>>>>=20 >>>>>>> Since you have the controller I guess it's easy to verify all >>>>>>> cases. For case 3, I believe the controller can't handle >>>>>>> fragmented frames so driver should have to explicitly check = ip_off >>>>>>> field of IPv4 header. See how gem(4)/sk(4)/hme(4) and fxp(4) >>>>>>> handle it. >>>>>> Let me check this. Is there a tool to send UDP/TCP with IP level = options >>>>>> or do I need to write a small test program myself? >>>>>>=20 >>>>>=20 >>>>> I recall I used buggy ipsend of ipfilter package in the past but = it >>>>> would be more easy to write a simple test program or patch driver >>>>> to generate those frames. >>>>>=20 >>>>=20 >>>>=20 >>>=20 >>=20 >>=20 >=20