From owner-freebsd-current@FreeBSD.ORG Thu Sep 23 23:32:41 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C56C516A4CE for ; Thu, 23 Sep 2004 23:32:41 +0000 (GMT) Received: from mail.mcneil.com (mcneil.com [24.199.45.54]) by mx1.FreeBSD.org (Postfix) with ESMTP id 859B943D48 for ; Thu, 23 Sep 2004 23:32:41 +0000 (GMT) (envelope-from sean@mcneil.com) Received: from localhost (localhost.mcneil.com [127.0.0.1]) by mail.mcneil.com (Postfix) with ESMTP id 4F6F7F1A4E; Thu, 23 Sep 2004 16:32:41 -0700 (PDT) Received: from mail.mcneil.com ([127.0.0.1]) by localhost (server.mcneil.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 46602-07; Thu, 23 Sep 2004 16:32:40 -0700 (PDT) Received: from [24.199.45.54] (mcneil.com [24.199.45.54]) by mail.mcneil.com (Postfix) with ESMTP id 52583F1834; Thu, 23 Sep 2004 16:32:40 -0700 (PDT) From: Sean McNeil To: John-Mark Gurney In-Reply-To: <20040923232124.GM72089@funkthat.com> References: <1095978035.59583.4.camel@server.mcneil.com> <20040923225234.GL72089@funkthat.com> <1095980931.59840.5.camel@server.mcneil.com> <20040923232124.GM72089@funkthat.com> Content-Type: text/plain Message-Id: <1095982360.59840.25.camel@server.mcneil.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 23 Sep 2004 16:32:40 -0700 Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at mcneil.com cc: freebsd-current@freebsd.org Subject: Re: re0 device txcsum issue X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Sep 2004 23:32:42 -0000 On Thu, 2004-09-23 at 16:21, John-Mark Gurney wrote: > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 16:08 -0700: > > On Thu, 2004-09-23 at 15:52, John-Mark Gurney wrote: > > > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 15:20 -0700: > > > > Is anyone willing to work with me to help trace down this problem? It > > > > has been outstanding for a long time and I would dearly like it fixed. > > > > I'm willing to help in any capacity to trace down the culprit. > > > > > > > > To recap, on the re0 device (possibly others) running -current on an > > > > amd64 processor there are times when udp packets get improper checksum > > > > calculations with txcsum set for the interface. This causes a deadlock > > > > in nfs as the client just contunuously requests this packet and it is > > > > rejected because of the checksum being bad. > > > > > > I have recently been working with the re driver, so I'm somewhat familar > > > with the driver. I haven't seen any issues, but I also don't have an > > > AMD64 system to test with. > > > > > > Have you tried to find out if it is packet size related? are you > > > trying to use jumbo frames? rwatson committed netsend to the src/tools > > > tree that could help this, and I have attached udpcheck.py which is > > > a client/server script to test/verify packet sizes of difference > > > sizes. > > > > My method of testing has been to just do an "ls -lR" from a large > > directory structure. With txcsum set, it consistently locks up. This > > is on clients ranging from x86 linux, BSD, a sparc solaris2, and an hppa > > HPUX machine. If I turn off txcsum (i.e. ifconfig re0 -txcsum) I have > > never had any problems. > > > > I tried your program with txcsum and it just hangs. Without txcsum, > > You can provide a -v to get more detailed information on what is going > on... > > Yes, the naming (in the python script) is a bit confusing since it is > from the server's poing of view.. I'm assuming you run the server side > (-s 1234) on another box and the client (server 1234) on the box w/ the > re driver? so, the errors below means that the client sent a udp packet > and it didn't match... > > > I'm not sure what the output here means, but this is what I got: > > w/o txcsum you got the following errors? This is very worrysome as > it means that you're getting udp packet corruption even w/o checksuming.. > > Ok, what you can do is add the line: > open('r.%d' % i, 'w').write(rdata) > just before the line: > print 'packet send mismatch at:', i, 'got:', rlen > > and then run the client as: > python udpcheck.py server 1234 -s 1792 -e 1795 > > You will then have a set of r.179[2-5] files... They are the contents > of the packet that was received by the server... if you could email > them to me in private mail, it might shed light on the problem.. > > Thanks. I initially ran the server (-s 1234) on the re0 side. I've now added the extra line and ran it in both directions. Sending you the output in a private email.... Sean