Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Sep 1997 11:31:45 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        mike@smith.net.au (Mike Smith)
Cc:        tarkhil@mgt.msk.ru, hackers@FreeBSD.ORG, stable@FreeBSD.ORG
Subject:   Re: 'fxp' driver/hardware lossage (was  Re: Alexander B. Povol's mail)
Message-ID:  <199709261131.EAA19446@usr09.primenet.com>
In-Reply-To: <199709260913.SAA00775@word.smith.net.au> from "Mike Smith" at Sep 26, 97 06:43:16 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > The 127.0.0.1 is not normally something that has anything at all to
> > do with the card driver.  Instead, it is internally looped back; it
> > is a simulated interface.  I don't see how shoving the interface
> > into promiscuous mode would help.
> 
> Putting the fxp driver into promiscuous mode involves poking at bits 
> of the interface hardware.  This appears to fix whatever it is that's 
> going wrong.  If this problem hasn't already been dealt with in the 
> -current/-stable drivers, I hope DG is looking at it.

So how the heck can pinging the loopback *ever* fail because of an
unrelated driver?

The only think I can think of is a screwed routing table or a screwed
arp table.  Yet he says neither of these change when the problem is
manifest.

Or.... his ethernet card is denial-of-service attacking him?!?  Maybe
the mbufs are all allocated, so he can't get any for the loopback
ping?  This could point to a bad driver assumption, like you can
transmit when a receive is pending, but the hardware has one queue
for both sets of operations, or something eqully idiotic.  Or it
may be bad code.

I still don't see how putting the card into a mode where it will
see yet more packets could fix it.  There is som suspicious code
about preemptively freeing mbufs in som bad cases surrounded by
#ifdef's for the BPF.  But to trigger, the BPF would have to actually
be in the process of being used (ifp->if_bpf != NULL).

Hmmm.  If this theory is true, then it's likely that what's holding
the mbufs is fixed by the fxp_stop() at the top of fxp_init() when
the mode is changed.

Technically, the SIOCSIFFLAGS case in fxp_ioctl() should fix this
with a standard ifconfig down then up.

Let's see...

This is an interesting problem... I'm going to provide some possible
hacks.  Normally I don't hack drivers for cards that I don't have
available for test and have nop documentation for, but with the wedges
you are seeing, these are what I'd try for myself.  8-).

*DON'T* mix these patches!!!

First, try this patch:

----------------------------------------------------------------------------
Index: if_fxp.c
===================================================================
RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v
retrieving revision 1.39
diff -c -r1.39 if_fxp.c
*** 1.39	1997/09/05 10:23:54
--- if_fxp.c	1997/09/26 10:24:47
***************
*** 1241,1247 ****
  	cbp->tno_int =		0;	/* (disable) tx not okay interrupt */
  	cbp->ci_int =		0;	/* interrupt on CU not active */
  	cbp->save_bf =		prm;	/* save bad frames */
! 	cbp->disc_short_rx =	!prm;	/* discard short packets */
  	cbp->underrun_retry =	1;	/* retry mode (1) on DMA underrun */
  	cbp->mediatype =	!sc->phy_10Mbps_only; /* interface mode */
  	cbp->nsai =		1;	/* (don't) disable source addr insert */
--- 1241,1247 ----
  	cbp->tno_int =		0;	/* (disable) tx not okay interrupt */
  	cbp->ci_int =		0;	/* interrupt on CU not active */
  	cbp->save_bf =		prm;	/* save bad frames */
! 	cbp->disc_short_rx =	0;	/* discard short packets */
  	cbp->underrun_retry =	1;	/* retry mode (1) on DMA underrun */
  	cbp->mediatype =	!sc->phy_10Mbps_only; /* interface mode */
  	cbp->nsai =		1;	/* (don't) disable source addr insert */
***************
*** 1253,1259 ****
  	cbp->promiscuous =	prm;	/* promiscuous mode */
  	cbp->bcast_disable =	0;	/* (don't) disable broadcasts */
  	cbp->crscdt =		0;	/* (CRS only) */
! 	cbp->stripping =	!prm;	/* truncate rx packet to byte count */
  	cbp->padding =		1;	/* (do) pad short tx packets */
  	cbp->rcv_crc_xfer =	0;	/* (don't) xfer CRC to host */
  	cbp->force_fdx =	0;	/* (don't) force full duplex */
--- 1253,1259 ----
  	cbp->promiscuous =	prm;	/* promiscuous mode */
  	cbp->bcast_disable =	0;	/* (don't) disable broadcasts */
  	cbp->crscdt =		0;	/* (CRS only) */
! 	cbp->stripping =	0;	/* truncate rx packet to byte count */
  	cbp->padding =		1;	/* (do) pad short tx packets */
  	cbp->rcv_crc_xfer =	0;	/* (don't) xfer CRC to host */
  	cbp->force_fdx =	0;	/* (don't) force full duplex */
----------------------------------------------------------------------------

This patch will work if the hardware has a bug that depends on one of
these two bits.

If this doesn't fix the problem, and you can't reset it by ifconfiging it
down and up, but promiscuous mode still works, then there is a bug in
fxp_intr() that causes leaks.  Most likely, it's an error condition
that results in bad packets that's not being caught by the driver.
This could be a hardware bug where non-local packets are "leaking",
or it could be an otherwise unsignalled bad status.  I believe this
patch *may* catch these cases (I don't have a hardware
manual to check it's correctness, or if the fields hold true out
of promiscuous mode:

----------------------------------------------------------------------------
Index: if_fxp.c
===================================================================
RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v
retrieving revision 1.39
diff -c -r1.39 if_fxp.c
*** 1.39	1997/09/05 10:23:54
--- if_fxp.c	1997/09/26 11:22:40
***************
*** 989,1009 ****
  						bpf_tap(FXP_BPFTAP_ARG(ifp),
  						    mtod(m, caddr_t),
  						    total_len); 
- 						/*
- 						 * Only pass this packet up
- 						 * if it is for us.
- 						 */
- 						if ((ifp->if_flags &
- 						    IFF_PROMISC) &&
- 						    (rfa->rfa_status &
- 						    FXP_RFA_STATUS_IAMATCH) &&
- 						    (eh->ether_dhost[0] & 1)
- 						    == 0) {
- 							m_freem(m);
- 							goto rcvloop;
- 						}
  					}
  #endif /* NBPFILTER > 0 */
  					m->m_data +=
  					    sizeof(struct ether_header);
  					ether_input(ifp, eh, m);
--- 989,1013 ----
  						bpf_tap(FXP_BPFTAP_ARG(ifp),
  						    mtod(m, caddr_t),
  						    total_len); 
  					}
  #endif /* NBPFILTER > 0 */
+ 					/*
+ 					 * Only pass this packet up
+ 					 * if it is for us.
+ 					 */
+ 					if (
+ 					    /*
+ 					     * XXX remove the next two lines
+ 					     * XXX and this comment if your
+ 					     * XXX interface goes away entirely
+ 					     * XXX with this patch
+ 					     */
+ 					    (rfa->rfa_status &
+ 					     FXP_RFA_STATUS_IAMATCH) &&
+ 					    (eh->ether_dhost[0] & 1) == 0) {
+ 						m_freem(m);
+ 						goto rcvloop;
+ 					}
  					m->m_data +=
  					    sizeof(struct ether_header);
  					ether_input(ifp, eh, m);
----------------------------------------------------------------------------

Note the funny thing about this: the old code assumes PROMISC mode,
so it *may* be the culprit, *IF* the BPF is present *AND* active,
and thus the reason setting PROMISC fixes it.  The FXP_RFA_STATUS_IAMATCH
may not happen out of PROMISC mode (see comment).

If all this fails, then you have an mbuf leak (this should cover the soft
reset case, the transmit buffer release, and the realloc/reinit receive
buffers, since fxp_stop() does all this and is called on a down then
up.

If you have an mbuf leak, then its a driver logic bug, and I don't know
the hardware well enough to find one of those very easily.  On the other
hand, if a down-then-up fixes it, then one of the three stops is the unwedger.

If it's the card reset, then the hardware is screwed.  If it's one of
the other, then the code is at fault.  It may be that the thing has a
single queue, and can't transmit while a recieve is outstanding, or vice
versa.  Maybe they studdied under the "multidrop" serial board vendors,
or at the VMS academy of terminal driver writing both have the one queue
problem).

If the hardware is screwed, and it is truly pounding on the control
bits that gets you back, then *maybe* this patch will fix you; it's
a hell of a kludge, but... it pounds the control bits the same way
you do manually:

----------------------------------------------------------------------------
Index: if_fxp.c
===================================================================
RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v
retrieving revision 1.39
diff -c -r1.39 if_fxp.c
*** 1.39	1997/09/05 10:23:54
--- if_fxp.c	1997/09/26 11:02:09
***************
*** 1167,1172 ****
--- 1167,1176 ----
  	log(LOG_ERR, FXP_FORMAT ": device timeout\n", FXP_ARGS(sc));
  	ifp->if_oerrors++;
  
+ 	/* bletcherous hack to pound bits to unwedge card*/
+ 	ifp->if_flags ^= IFF_PROMISC;	/* toggle it*/
+ 	fxp_init(sc);
+ 	ifp->if_flags ^= IFF_PROMISC;	/* toggle it back*/
  	fxp_init(sc);
  }
  
----------------------------------------------------------------------------

The real pain with this is that (1) it assumes the watchdog is being
called (ie: transmits are actually failing), (2) It's take a "long"
time... maybe enough for you to feel the hiccup (but nothing you can
do about that if the chip is broken), and (3) you're going to get some
packets which aren't meant for you in a tiny (but annoying) window.

A better fix would call the watchdog when you've been waiting a "long time"
for an ACK, so it wouldn't depend on the transmit not functioning; for
all I know, it is.  8-(

					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709261131.EAA19446>