Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Sep 2001 08:42:25 -0700 (PDT)
From:      Matthew Jacob <mjacob@feral.com>
To:        Ronald G Minnich <rminnich@lanl.gov>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: TCP&IP cksum offload on FreeBSD 4.2
Message-ID:  <Pine.BSF.4.21.0109270840590.64352-100000@beppo>
In-Reply-To: <Pine.LNX.4.33.0109270728220.26552-100000@snaresland.acl.lanl.gov>

next in thread | previous in thread | raw e-mail | index | archive | help

It certainly occurs at a rate to worry one. Alan Poston found definite cases
of corruption when doing heavy IDE testing. It varies, motherboard to
motherboard.


On Thu, 27 Sep 2001, Ronald G Minnich wrote:

> On Thu, 27 Sep 2001, Andrew Gallatin wrote:
> 
> > I just wanted to say that you did a hell of a job with the csum
> > offload stuff in FreeBSD.  FreeBSD is the only OS that I'm aware of
> > which allows a driver to choose not to handle csum'ing IP frags on
> > transmit.  Having the option to not handle frags is very, very handy.
> > I wish other platforms had it.
> 
> I have a question on the checksum offloading. Has anyone measured any
> incidence of data corruption between the PCI card and memory. In other
> words, when you offload checksums the end-to-end checking becomes
> card-to-card checking, and the possibility exists that what goes in memory
> at the destination end is not what was sent at the source. Very remote
> possibility, of course, but ...
> 
> It's not that the data gets corrupted (usually). It's that
> once-in-a-100-trillion errors could result in the occasional dropped
> half-packet or missed word (i.e. overflow). The missed word problem is
> usual a miscommunication between card and PCI chipset about how a PCI
> ABORT is supposed to work ... which we've seen on some very recent
> just-released chipset/network card combinations,.
> 
> Does this happen? Yes. We've seen it on, to name just two, HIPPI800 and
> Myrinet cards. In each case it was not actual data corruption, it was
> "can't happen" DMA scenarios that once in a very long while (1 in 10^14 or
> so)  resulted in bits of packets getting corrupted. Each of these cards
> has a very high-quality end-end CRC for the data, and Myrinet has flow
> control. We're not the only place that has seen this problem, and I've
> been told that many commerical Myrinet clients run IP over Myrinet because
> of these types of problems (of course FreeBSD has the fastest IP over
> Myrinet anyway, so it's not like that's a huge problem).
> 
> Is it likely? Well, on one cluster here, with 48 machines and 12
> interfaces per machine, it's not only likely, it's a given. Without
> software checksums you are going to get data corruption.
> 
> What I don't know is whether offloaded checksums on commodity ethernet
> cards have seen anything similar.
> 
> I assume that checksums across all the frags are done by the kernel (i.e.
> NFS would checksum the full UDP packet)? Has anyone measured to see if
> there is corruption occuring on the frags, ever? Of course it would
> probably take a while ...
> 
> Thanks in advance for any information you might have.
> 
> ron
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0109270840590.64352-100000>