Date: Thu, 23 Sep 2004 16:32:40 -0700 From: Sean McNeil <sean@mcneil.com> To: John-Mark Gurney <gurney_j@resnet.uoregon.edu> Cc: freebsd-current@freebsd.org Subject: Re: re0 device txcsum issue Message-ID: <1095982360.59840.25.camel@server.mcneil.com> In-Reply-To: <20040923232124.GM72089@funkthat.com> References: <1095978035.59583.4.camel@server.mcneil.com> <20040923225234.GL72089@funkthat.com> <1095980931.59840.5.camel@server.mcneil.com> <20040923232124.GM72089@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2004-09-23 at 16:21, John-Mark Gurney wrote: > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 16:08 -0700: > > On Thu, 2004-09-23 at 15:52, John-Mark Gurney wrote: > > > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 15:20 -0700: > > > > Is anyone willing to work with me to help trace down this problem? It > > > > has been outstanding for a long time and I would dearly like it fixed. > > > > I'm willing to help in any capacity to trace down the culprit. > > > > > > > > To recap, on the re0 device (possibly others) running -current on an > > > > amd64 processor there are times when udp packets get improper checksum > > > > calculations with txcsum set for the interface. This causes a deadlock > > > > in nfs as the client just contunuously requests this packet and it is > > > > rejected because of the checksum being bad. > > > > > > I have recently been working with the re driver, so I'm somewhat familar > > > with the driver. I haven't seen any issues, but I also don't have an > > > AMD64 system to test with. > > > > > > Have you tried to find out if it is packet size related? are you > > > trying to use jumbo frames? rwatson committed netsend to the src/tools > > > tree that could help this, and I have attached udpcheck.py which is > > > a client/server script to test/verify packet sizes of difference > > > sizes. > > > > My method of testing has been to just do an "ls -lR" from a large > > directory structure. With txcsum set, it consistently locks up. This > > is on clients ranging from x86 linux, BSD, a sparc solaris2, and an hppa > > HPUX machine. If I turn off txcsum (i.e. ifconfig re0 -txcsum) I have > > never had any problems. > > > > I tried your program with txcsum and it just hangs. Without txcsum, > > You can provide a -v to get more detailed information on what is going > on... > > Yes, the naming (in the python script) is a bit confusing since it is > from the server's poing of view.. I'm assuming you run the server side > (-s 1234) on another box and the client (server 1234) on the box w/ the > re driver? so, the errors below means that the client sent a udp packet > and it didn't match... > > > I'm not sure what the output here means, but this is what I got: > > w/o txcsum you got the following errors? This is very worrysome as > it means that you're getting udp packet corruption even w/o checksuming.. > > Ok, what you can do is add the line: > open('r.%d' % i, 'w').write(rdata) > just before the line: > print 'packet send mismatch at:', i, 'got:', rlen > > and then run the client as: > python udpcheck.py server 1234 -s 1792 -e 1795 > > You will then have a set of r.179[2-5] files... They are the contents > of the packet that was received by the server... if you could email > them to me in private mail, it might shed light on the problem.. > > Thanks. I initially ran the server (-s 1234) on the re0 side. I've now added the extra line and ran it in both directions. Sending you the output in a private email.... Sean
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1095982360.59840.25.camel>