Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Sep 2004 16:32:40 -0700
From:      Sean McNeil <sean@mcneil.com>
To:        John-Mark Gurney <gurney_j@resnet.uoregon.edu>
Cc:        freebsd-current@freebsd.org
Subject:   Re: re0 device txcsum issue
Message-ID:  <1095982360.59840.25.camel@server.mcneil.com>
In-Reply-To: <20040923232124.GM72089@funkthat.com>
References:  <1095978035.59583.4.camel@server.mcneil.com> <20040923225234.GL72089@funkthat.com> <1095980931.59840.5.camel@server.mcneil.com> <20040923232124.GM72089@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2004-09-23 at 16:21, John-Mark Gurney wrote:
> Sean McNeil wrote this message on Thu, Sep 23, 2004 at 16:08 -0700:
> > On Thu, 2004-09-23 at 15:52, John-Mark Gurney wrote:
> > > Sean McNeil wrote this message on Thu, Sep 23, 2004 at 15:20 -0700:
> > > > Is anyone willing to work with me to help trace down this problem?  It
> > > > has been outstanding for a long time and I would dearly like it fixed. 
> > > > I'm willing to help in any capacity to trace down the culprit.
> > > > 
> > > > To recap, on the re0 device (possibly others) running -current on an
> > > > amd64 processor there are times when udp packets get improper checksum
> > > > calculations with txcsum set for the interface.  This causes a deadlock
> > > > in nfs as the client just contunuously requests this packet and it is
> > > > rejected because of the checksum being bad.
> > > 
> > > I have recently been working with the re driver, so I'm somewhat familar
> > > with the driver.  I haven't seen any issues, but I also don't have an
> > > AMD64 system to test with.
> > > 
> > > Have you tried to find out if it is packet size related?  are you
> > > trying to use jumbo frames?  rwatson committed netsend to the src/tools
> > > tree that could help this, and I have attached udpcheck.py which is
> > > a client/server script to test/verify packet sizes of difference
> > > sizes.
> > 
> > My method of testing has been to just do an "ls -lR" from a large
> > directory structure.  With txcsum set, it consistently locks up.  This
> > is on clients ranging from x86 linux, BSD, a sparc solaris2, and an hppa
> > HPUX machine.  If I turn off txcsum (i.e. ifconfig re0 -txcsum) I have
> > never had any problems.
> > 
> > I tried your program with txcsum and it just hangs.  Without txcsum, 
> 
> You can provide a -v to get more detailed information on what is going
> on...
> 
> Yes, the naming (in the python script) is a bit confusing since it is
> from the server's poing of view..  I'm assuming you run the server side
> (-s 1234) on another box and the client (server 1234) on the box w/ the
> re driver?  so, the errors below means that the client sent a udp packet
> and it didn't match...
> 
> > I'm not sure what the output here means, but this is what I got:
> 
> w/o txcsum you got the following errors?  This is very worrysome as
> it means that you're getting udp packet corruption even w/o checksuming..
> 
> Ok, what you can do is add the line:
>                                 open('r.%d' % i, 'w').write(rdata)
> just before the line:
>                                 print 'packet send mismatch at:', i, 'got:', rlen
> 
> and then run the client as:
> python udpcheck.py server 1234 -s 1792 -e 1795
> 
> You will then have a set of r.179[2-5] files... They are the contents
> of the packet that was received by the server...  if you could email
> them to me in private mail, it might shed light on the problem..
> 
> Thanks.

I initially ran the server (-s 1234) on the re0 side.  I've now added
the extra line and ran it in both directions.  Sending you the output in
a private email....

Sean




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1095982360.59840.25.camel>