Date: Mon, 12 May 2014 10:36:12 +0900 From: Yonghyeon PYUN <pyunyh@gmail.com> To: Michael Tuexen <Michael.Tuexen@lurchi.franken.de> Cc: FreeBSD Net <freebsd-net@freebsd.org>, "Bjoern A. Zeeb" <bz@FreeBSD.org> Subject: Re: RX checksum offloading problem Message-ID: <20140512013612.GA4085@michelle.cdnetworks.com> In-Reply-To: <B4F16B70-825A-441A-82AA-F7ED28C9D254@lurchi.franken.de> References: <0EB8F4F6-65C2-4B90-8101-FCC53A15C6F9@lurchi.franken.de> <A345E6A0-D6FF-4E69-AFBD-9BB67B82F02E@FreeBSD.org> <B149FC4B-4F15-4619-A04F-F1A08DDC1741@lurchi.franken.de> <20140507075612.GA1376@michelle.cdnetworks.com> <36469814-FAC8-4172-A792-487E2AB8ECB9@lurchi.franken.de> <20140507083751.GB1376@michelle.cdnetworks.com> <415C1CB5-3AF9-44E4-943A-74116037980E@lurchi.franken.de> <20140509013556.GA3014@michelle.cdnetworks.com> <B4F16B70-825A-441A-82AA-F7ED28C9D254@lurchi.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 09, 2014 at 12:46:48PM +0200, Michael Tuexen wrote: > On 09 May 2014, at 03:35, Yonghyeon PYUN <pyunyh@gmail.com> wrote: > > > On Thu, May 08, 2014 at 08:40:22PM +0200, Michael Tuexen wrote: > >> On 07 May 2014, at 10:37, Yonghyeon PYUN <pyunyh@gmail.com> wrote: > >> > >>> On Wed, May 07, 2014 at 10:07:09AM +0200, Michael Tuexen wrote: > >>>> On 07 May 2014, at 09:56, Yonghyeon PYUN <pyunyh@gmail.com> wrote: > >>>> > >>>>> On Sat, May 03, 2014 at 11:52:47AM +0200, Michael Tuexen wrote: > >>>>>> On 02 May 2014, at 16:02, Bjoern A. Zeeb <bz@FreeBSD.org> wrote: > >>>>>> > >>>>>>> > >>>>>>> On 02 May 2014, at 10:22 , Michael Tuexen <Michael.Tuexen@lurchi.franken.de> wrote: > >>>>>>> > >>>>>>>> Dear all, > >>>>>>>> > >>>>>>>> during testing I found that FreeBSD head (on a raspberry pi) accepts SCTP packet > >>>>>>>> with bad checksums. After debugging this I figured out that this is a problem with > >>>>>>>> the csum_flags defined in mbuf.h. > >>>>>>>> > >>>>>>>> The SCTP code on its input path checks for CSUM_SCTP_VALID, which is defined in mbuf.h: > >>>>>>>> #define CSUM_SCTP_VALID CSUM_L4_VALID > >>>>>>>> This makes sense: If CSUM_SCTP_VALID is set in csum_flags, the packet is considered > >>>>>>>> to have a correct checksum. > >>>>>>>> > >>>>>>>> For UDP and TCP some drivers calculate the UDP/TCP checksum and set CSUM_DATA_VALID in > >>>>>>>> csum_flags to indicate that the UDP/TCP should consider csum_data to figure out if > >>>>>>>> the packet has a correct checksum. The problem is that CSUM_DATA_VALID is defined as > >>>>>>>> #define CSUM_DATA_VALID CSUM_L4_VALID > >>>>>>>> In this case the semantic is not that the packet has a valid checksum, but the csum_data > >>>>>>>> field contains information. > >>>>>>>> > >>>>>>>> Now the following happens (on the raspberry pi the driver used is > >>>>>>>> dev/usb/net/if_smsc.c > >>>>>>>> > >>>>>>>> 1. A packet is received and if it is not too short, the checksum computed > >>>>>>>> is stored in csum_data and the flag CSUM_DATA_VALID is set. This happens > >>>>>>>> for all IP packets, not only for UDP and TCP packets. > >>>>>>>> 2. In case of SCTP packets, the SCTP interprets CSUM_DATA_VALID as CSUM_SCTP_VALID > >>>>>>>> and accepts the packet. So no SCTP checksum check ever happened. > >>>>>>>> > >>>>>>>> Alternatives to fix this: > >>>>>>>> > >>>>>>>> 1. Change all drivers to set CSUM_DATA_VALID only in case of UDP or TCP packets, since > >>>>>>>> it only makes sense in these cases. > >>>>>>> > >>>>>>> Wait, or for SCTP in cad the crc32 (I think it was) was actually checked but not otherwise. This is how it should be imho. It seems like a driver bug. > >>>>>> I went through the list of drivers and you are right, it seems to be a bug > >>>>>> in if_smsc.c. Most of the other drivers check for UDP/TCP, a small set I can't tell. > >>>>>> > >>>>> > >>>>> I'm not sure how the controller computes TCP/UDP checksum values. > >>>>> It seems the publicly available data sheet was highly sanitized so > >>>>> it was useless to me. The comment in the driver says that the > >>>> Same for me... > >>>>> controller computes RX checksum after the IPv4 header to the end of > >>>>> ethernet frame. After seeing that comment, three questions popped > >>>>> up: > >>>>> > >> OK, I did some testing. It looks like the card is just computing the > >> checksum over the IP payload taking the correct IP header length into account. > >>>>> 1. Is the controller smart enough to skip IP options header in > >>>>> TCP/UDP checksum offloading? > >> Yes, I can send fragmented and un-fragmented UDP packets with IP options > >> and they are handled correctly. Even if the last fragment is too short. > > > > I'm assuming you're taking about receiving fragmented UDP packets > > with RX checksum offloading, right? > Correct. > > > >>>>> 2. How controller handles UDP checksum value 0x0000(i.e. sender > >>>>> didn't compute UDP checksum)? > >> This case isn't handled. However, udp_input() looks first for zero checksums > >> and only after that in the csum_flags. So it doesn't result in any problems. > >> Would you prefer not to set CSUM_DATA_VALID in this case? > > > > At least, it correctly updates UDP stats of netstat(1). > Let me double check that... > > > >>>>> 3. How the controller can compute TCP checksum of fragmented > >>>>> packets? > >> At least it does it right for UDP... > >> > > > > Hmm, CSUM_DATA_VALID indicates driver computed RX TCP/UDP checksum > > without pseudo header. As you know, controller can't compute > > TCP/UDP checksum until all its fragmented payload are read from > > wire. Packets may arrive out of order and may be mixed with other > I'm not sure I understand this... Please note that the pseudo header > is not taken into account. So the card can compute the checksum over > the payload of IP for each fragment. This is stored in the csum_data > field. During reassembly the csum_data fields of the fragments are > combined in > http://svnweb.freebsd.org/base/head/sys/netinet/ip_input.c?view=markup#l1075 > This looks OK to me. I'm not sure why you think the cards needs > to keep state for this. I understand it needs state if it wants > to take the pseudoheader into account, but this is not done here. Oops, sorry. You're right. Probably I was confused with old memory when I worked on that area. I've quickly read IP reassembly code again and as you said, it should work. However it seems there is a checksumming bug here. /* * In order to do checksumming faster we do 'end-around carry' here * (and not in for{} loop), though it implies we are not going to * reassemble more than 64k fragments. */ m->m_pkthdr.csum_data = (m->m_pkthdr.csum_data & 0xffff) + (m->m_pkthdr.csum_data >> 16); I guess the line above didn't account possible carry happened after the computation. Probably it could be rewritten as the following. while (m->m_pkthdr.csum_data & 0xffff0000) m->m_pkthdr.csum_data = (m->m_pkthdr.csum_data & 0xffff) + (m->m_pkthdr.csum_data >> 16); > > packets. Advanced controllers with enough memory may be able to > > compute TCP/UDP checksums by tracking each connections(e.g LRO) but > > low-end controllers may be not. It seems the controller does not > > even support RX TCP/UDP pseudo header checksum offloading so I > > wonder how this controller can support RX TCP/UDP checksum > > offloading for fragmented TCP/UDP packets. > I'm not sure what I'm missing here... You compute the sum over > the parts and add the sums of the parts. That should work for > the UDP/TCP checksum. > > > > Some controllers maintain two bits for TCP/UDP checksum offloading > > result in status word. One bit is used to indicate whether > > controller performed TCP/UDP checksum offloading and the other bit > > is used to indicate whether the computed checksum is correct or > For SCTP we only get a bit that the checksum was computed and is correct... > > not. For UDP checksum value 0x0000 and fragmented TCP/UDP packets, > > these controllers do not attempt to compute TCP/UDP checksum. > I think it "just" computes it and leaves it up to the upper layer > to use the result or not... > Yes, I stand corrected. Thanks. :-) > Best regards > Michael > > > >> Best regards > >> Michael > >>>>> > >>>>> Since you have the controller I guess it's easy to verify all > >>>>> cases. For case 3, I believe the controller can't handle > >>>>> fragmented frames so driver should have to explicitly check ip_off > >>>>> field of IPv4 header. See how gem(4)/sk(4)/hme(4) and fxp(4) > >>>>> handle it. > >>>> Let me check this. Is there a tool to send UDP/TCP with IP level options > >>>> or do I need to write a small test program myself? > >>>> > >>> > >>> I recall I used buggy ipsend of ipfilter package in the past but it > >>> would be more easy to write a simple test program or patch driver > >>> to generate those frames. > >>> > >> > >> > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140512013612.GA4085>