From owner-freebsd-hackers  Thu Sep 27  9:22:14 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from beppo.feral.com (beppo.feral.com [192.67.166.79])
	by hub.freebsd.org (Postfix) with ESMTP id 7B63937B61D
	for <freebsd-hackers@FreeBSD.ORG>; Thu, 27 Sep 2001 09:22:09 -0700 (PDT)
Received: from wonky.feral.com (wonky.feral.com [192.67.166.7])
	by beppo.feral.com (8.11.3/8.11.3) with ESMTP id f8RGLxH64966;
	Thu, 27 Sep 2001 09:21:59 -0700 (PDT)
	(envelope-from mjacob@feral.com)
Date: Thu, 27 Sep 2001 09:21:29 -0700 (PDT)
From: Matthew Jacob <mjacob@feral.com>
Reply-To: <mjacob@feral.com>
To: Sandeep Joshi <sandeepj@research.bell-labs.com>
Cc: <freebsd-hackers@FreeBSD.ORG>
Subject: Re: TCP&IP cksum offload on FreeBSD 4.2 
In-Reply-To: <3BB34DD2.FC65196A@research.bell-labs.com>
Message-ID: <20010927092116.B50870-100000@wonky.feral.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG


Oh, yeah- I forgot about this. Jonathon is a pretty good NetBSD hacker..


On Thu, 27 Sep 2001, Sandeep Joshi wrote:

>
> Ron,
>
> This may be of interest...
>
> http://citeseer.nj.nec.com/stone00when.html
>
>   When The CRC and TCP Checksum Disagree
>   Jonathan Stone, Craig Partridge  SIGCOMM
>
> -Sandeep
>
> On Thu, 27 Sep 2001, Ronald G Minnich wrote:
> >
> > I have a question on the checksum offloading. Has anyone measured any
> > incidence of data corruption between the PCI card and memory. In other
> > words, when you offload checksums the end-to-end checking becomes
> > card-to-card checking, and the possibility exists that what goes in memory
> > at the destination end is not what was sent at the source. Very remote
> > possibility, of course, but ...
> >
> > It's not that the data gets corrupted (usually). It's that
> > once-in-a-100-trillion errors could result in the occasional dropped
> > half-packet or missed word (i.e. overflow). The missed word problem is
> > usual a miscommunication between card and PCI chipset about how a PCI
> > ABORT is supposed to work ... which we've seen on some very recent
> > just-released chipset/network card combinations,.
> >
> > Does this happen? Yes. We've seen it on, to name just two, HIPPI800 and
> > Myrinet cards. In each case it was not actual data corruption, it was
> > "can't happen" DMA scenarios that once in a very long while (1 in 10^14 or
> > so)  resulted in bits of packets getting corrupted. Each of these cards
> > has a very high-quality end-end CRC for the data, and Myrinet has flow
> > control. We're not the only place that has seen this problem, and I've
> > been told that many commerical Myrinet clients run IP over Myrinet because
> > of these types of problems (of course FreeBSD has the fastest IP over
> > Myrinet anyway, so it's not like that's a huge problem).
> >
> > Is it likely? Well, on one cluster here, with 48 machines and 12
> > interfaces per machine, it's not only likely, it's a given. Without
> > software checksums you are going to get data corruption.
> >
> > What I don't know is whether offloaded checksums on commodity ethernet
> > cards have seen anything similar.
> >
> > I assume that checksums across all the frags are done by the kernel (i.e.
> > NFS would checksum the full UDP packet)? Has anyone measured to see if
> > there is corruption occuring on the frags, ever? Of course it would
> > probably take a while ...
> >
> > Thanks in advance for any information you might have.
> >
> > ron
> >
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message