Date: Thu, 27 Sep 2001 09:21:29 -0700 (PDT) From: Matthew Jacob <mjacob@feral.com> To: Sandeep Joshi <sandeepj@research.bell-labs.com> Cc: <freebsd-hackers@FreeBSD.ORG> Subject: Re: TCP&IP cksum offload on FreeBSD 4.2 Message-ID: <20010927092116.B50870-100000@wonky.feral.com> In-Reply-To: <3BB34DD2.FC65196A@research.bell-labs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Oh, yeah- I forgot about this. Jonathon is a pretty good NetBSD hacker.. On Thu, 27 Sep 2001, Sandeep Joshi wrote: > > Ron, > > This may be of interest... > > http://citeseer.nj.nec.com/stone00when.html > > When The CRC and TCP Checksum Disagree > Jonathan Stone, Craig Partridge SIGCOMM > > -Sandeep > > On Thu, 27 Sep 2001, Ronald G Minnich wrote: > > > > I have a question on the checksum offloading. Has anyone measured any > > incidence of data corruption between the PCI card and memory. In other > > words, when you offload checksums the end-to-end checking becomes > > card-to-card checking, and the possibility exists that what goes in memory > > at the destination end is not what was sent at the source. Very remote > > possibility, of course, but ... > > > > It's not that the data gets corrupted (usually). It's that > > once-in-a-100-trillion errors could result in the occasional dropped > > half-packet or missed word (i.e. overflow). The missed word problem is > > usual a miscommunication between card and PCI chipset about how a PCI > > ABORT is supposed to work ... which we've seen on some very recent > > just-released chipset/network card combinations,. > > > > Does this happen? Yes. We've seen it on, to name just two, HIPPI800 and > > Myrinet cards. In each case it was not actual data corruption, it was > > "can't happen" DMA scenarios that once in a very long while (1 in 10^14 or > > so) resulted in bits of packets getting corrupted. Each of these cards > > has a very high-quality end-end CRC for the data, and Myrinet has flow > > control. We're not the only place that has seen this problem, and I've > > been told that many commerical Myrinet clients run IP over Myrinet because > > of these types of problems (of course FreeBSD has the fastest IP over > > Myrinet anyway, so it's not like that's a huge problem). > > > > Is it likely? Well, on one cluster here, with 48 machines and 12 > > interfaces per machine, it's not only likely, it's a given. Without > > software checksums you are going to get data corruption. > > > > What I don't know is whether offloaded checksums on commodity ethernet > > cards have seen anything similar. > > > > I assume that checksums across all the frags are done by the kernel (i.e. > > NFS would checksum the full UDP packet)? Has anyone measured to see if > > there is corruption occuring on the frags, ever? Of course it would > > probably take a while ... > > > > Thanks in advance for any information you might have. > > > > ron > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010927092116.B50870-100000>