Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Apr 2006 18:06:27 -0700 (PDT)
From:      Cheng Jin <chengjin@cs.caltech.edu>
To:        freebsd-net@FreeBSD.org
Subject:   em in 6.0 missing rx interrupts?
Message-ID:  <Pine.LNX.4.60.0604171240190.31972@orchestra.cs.caltech.edu>

next in thread | raw e-mail | index | archive | help

Hi,

I am running into a problem with em driver possibly missing rx
interrupts.

I have the following setup

Linux NFS server -- FreeBSD 6.0 ethernet bridge -- Linux NFS client

and I run a test on the NFS client that repeatedly mounts an NFS 
directory, retrieves a file, and unmounts the dir.  The FreeBSD 6.0
runs on a machine with SuperMicro 5015M-MF motherboard with dual-port 
onboard Intel 82573v controller.  The ethernet bridging was setup
using bridge.ko with em0 connected to the NFS server, and em1
connected to the client.

The problem is that after a while, the test freezes when the NFS
client sends a SYN (NFS over TCP) and never receives the SYNACK for
it, and neither are the subsequent retransmitted SYNACKs (some of
them triggered by retransmitted SYNs from the client) ever received.
I have verified that the NFS server indeed puts the SYNACKs on the
wire, by putting an additional bridge with Broadcom ethernet ports
(bge) in between the FreeBSD 6.0 bridge and the NFS server.  This new 
bridge can see the SYNACK packets fine and forwards them on, but the 
bridge with em ports just can't see the SYNACKs.  If I replace the
bridge with em ports by the one with bge ports, the problem
goes away.

The test can eventually recover when the client uses a different port
to contact the server.  Also, this problem occurs for different
src/dst port pairs, and once the first SYNACK goes missing, none of
the subsequent SYNACKs will go through either.

I have gotten the latest em driver code for RELENG_6 off the web
cvs and instrumented the em_process_receive_interrupts function.
I don't see this function called when this missing SYNACK situation 
occurs.  I am not sure whether the interrupts themselves never
occurred,  or somehow the processing didn't reach this particular
function due to something else.

I could continue to add printfs to the em functions to see where
things go wrong, but I would like to hear some suggestions from
people on what might be going wrong here.

Many thanks!

Cheng

P.S. I had an IPMI card in the machine that was generating gratuitous
arps last week, and I suspected that it was swallowing the pkts, but
I still have this problem after removing it.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.60.0604171240190.31972>