Date: Fri, 26 Sep 1997 11:31:45 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: mike@smith.net.au (Mike Smith) Cc: tarkhil@mgt.msk.ru, hackers@FreeBSD.ORG, stable@FreeBSD.ORG Subject: Re: 'fxp' driver/hardware lossage (was Re: Alexander B. Povol's mail) Message-ID: <199709261131.EAA19446@usr09.primenet.com> In-Reply-To: <199709260913.SAA00775@word.smith.net.au> from "Mike Smith" at Sep 26, 97 06:43:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > The 127.0.0.1 is not normally something that has anything at all to > > do with the card driver. Instead, it is internally looped back; it > > is a simulated interface. I don't see how shoving the interface > > into promiscuous mode would help. > > Putting the fxp driver into promiscuous mode involves poking at bits > of the interface hardware. This appears to fix whatever it is that's > going wrong. If this problem hasn't already been dealt with in the > -current/-stable drivers, I hope DG is looking at it. So how the heck can pinging the loopback *ever* fail because of an unrelated driver? The only think I can think of is a screwed routing table or a screwed arp table. Yet he says neither of these change when the problem is manifest. Or.... his ethernet card is denial-of-service attacking him?!? Maybe the mbufs are all allocated, so he can't get any for the loopback ping? This could point to a bad driver assumption, like you can transmit when a receive is pending, but the hardware has one queue for both sets of operations, or something eqully idiotic. Or it may be bad code. I still don't see how putting the card into a mode where it will see yet more packets could fix it. There is som suspicious code about preemptively freeing mbufs in som bad cases surrounded by #ifdef's for the BPF. But to trigger, the BPF would have to actually be in the process of being used (ifp->if_bpf != NULL). Hmmm. If this theory is true, then it's likely that what's holding the mbufs is fixed by the fxp_stop() at the top of fxp_init() when the mode is changed. Technically, the SIOCSIFFLAGS case in fxp_ioctl() should fix this with a standard ifconfig down then up. Let's see... This is an interesting problem... I'm going to provide some possible hacks. Normally I don't hack drivers for cards that I don't have available for test and have nop documentation for, but with the wedges you are seeing, these are what I'd try for myself. 8-). *DON'T* mix these patches!!! First, try this patch: ---------------------------------------------------------------------------- Index: if_fxp.c =================================================================== RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v retrieving revision 1.39 diff -c -r1.39 if_fxp.c *** 1.39 1997/09/05 10:23:54 --- if_fxp.c 1997/09/26 10:24:47 *************** *** 1241,1247 **** cbp->tno_int = 0; /* (disable) tx not okay interrupt */ cbp->ci_int = 0; /* interrupt on CU not active */ cbp->save_bf = prm; /* save bad frames */ ! cbp->disc_short_rx = !prm; /* discard short packets */ cbp->underrun_retry = 1; /* retry mode (1) on DMA underrun */ cbp->mediatype = !sc->phy_10Mbps_only; /* interface mode */ cbp->nsai = 1; /* (don't) disable source addr insert */ --- 1241,1247 ---- cbp->tno_int = 0; /* (disable) tx not okay interrupt */ cbp->ci_int = 0; /* interrupt on CU not active */ cbp->save_bf = prm; /* save bad frames */ ! cbp->disc_short_rx = 0; /* discard short packets */ cbp->underrun_retry = 1; /* retry mode (1) on DMA underrun */ cbp->mediatype = !sc->phy_10Mbps_only; /* interface mode */ cbp->nsai = 1; /* (don't) disable source addr insert */ *************** *** 1253,1259 **** cbp->promiscuous = prm; /* promiscuous mode */ cbp->bcast_disable = 0; /* (don't) disable broadcasts */ cbp->crscdt = 0; /* (CRS only) */ ! cbp->stripping = !prm; /* truncate rx packet to byte count */ cbp->padding = 1; /* (do) pad short tx packets */ cbp->rcv_crc_xfer = 0; /* (don't) xfer CRC to host */ cbp->force_fdx = 0; /* (don't) force full duplex */ --- 1253,1259 ---- cbp->promiscuous = prm; /* promiscuous mode */ cbp->bcast_disable = 0; /* (don't) disable broadcasts */ cbp->crscdt = 0; /* (CRS only) */ ! cbp->stripping = 0; /* truncate rx packet to byte count */ cbp->padding = 1; /* (do) pad short tx packets */ cbp->rcv_crc_xfer = 0; /* (don't) xfer CRC to host */ cbp->force_fdx = 0; /* (don't) force full duplex */ ---------------------------------------------------------------------------- This patch will work if the hardware has a bug that depends on one of these two bits. If this doesn't fix the problem, and you can't reset it by ifconfiging it down and up, but promiscuous mode still works, then there is a bug in fxp_intr() that causes leaks. Most likely, it's an error condition that results in bad packets that's not being caught by the driver. This could be a hardware bug where non-local packets are "leaking", or it could be an otherwise unsignalled bad status. I believe this patch *may* catch these cases (I don't have a hardware manual to check it's correctness, or if the fields hold true out of promiscuous mode: ---------------------------------------------------------------------------- Index: if_fxp.c =================================================================== RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v retrieving revision 1.39 diff -c -r1.39 if_fxp.c *** 1.39 1997/09/05 10:23:54 --- if_fxp.c 1997/09/26 11:22:40 *************** *** 989,1009 **** bpf_tap(FXP_BPFTAP_ARG(ifp), mtod(m, caddr_t), total_len); - /* - * Only pass this packet up - * if it is for us. - */ - if ((ifp->if_flags & - IFF_PROMISC) && - (rfa->rfa_status & - FXP_RFA_STATUS_IAMATCH) && - (eh->ether_dhost[0] & 1) - == 0) { - m_freem(m); - goto rcvloop; - } } #endif /* NBPFILTER > 0 */ m->m_data += sizeof(struct ether_header); ether_input(ifp, eh, m); --- 989,1013 ---- bpf_tap(FXP_BPFTAP_ARG(ifp), mtod(m, caddr_t), total_len); } #endif /* NBPFILTER > 0 */ + /* + * Only pass this packet up + * if it is for us. + */ + if ( + /* + * XXX remove the next two lines + * XXX and this comment if your + * XXX interface goes away entirely + * XXX with this patch + */ + (rfa->rfa_status & + FXP_RFA_STATUS_IAMATCH) && + (eh->ether_dhost[0] & 1) == 0) { + m_freem(m); + goto rcvloop; + } m->m_data += sizeof(struct ether_header); ether_input(ifp, eh, m); ---------------------------------------------------------------------------- Note the funny thing about this: the old code assumes PROMISC mode, so it *may* be the culprit, *IF* the BPF is present *AND* active, and thus the reason setting PROMISC fixes it. The FXP_RFA_STATUS_IAMATCH may not happen out of PROMISC mode (see comment). If all this fails, then you have an mbuf leak (this should cover the soft reset case, the transmit buffer release, and the realloc/reinit receive buffers, since fxp_stop() does all this and is called on a down then up. If you have an mbuf leak, then its a driver logic bug, and I don't know the hardware well enough to find one of those very easily. On the other hand, if a down-then-up fixes it, then one of the three stops is the unwedger. If it's the card reset, then the hardware is screwed. If it's one of the other, then the code is at fault. It may be that the thing has a single queue, and can't transmit while a recieve is outstanding, or vice versa. Maybe they studdied under the "multidrop" serial board vendors, or at the VMS academy of terminal driver writing both have the one queue problem). If the hardware is screwed, and it is truly pounding on the control bits that gets you back, then *maybe* this patch will fix you; it's a hell of a kludge, but... it pounds the control bits the same way you do manually: ---------------------------------------------------------------------------- Index: if_fxp.c =================================================================== RCS file: /b/cvstree/ncvs/src/sys/pci/if_fxp.c,v retrieving revision 1.39 diff -c -r1.39 if_fxp.c *** 1.39 1997/09/05 10:23:54 --- if_fxp.c 1997/09/26 11:02:09 *************** *** 1167,1172 **** --- 1167,1176 ---- log(LOG_ERR, FXP_FORMAT ": device timeout\n", FXP_ARGS(sc)); ifp->if_oerrors++; + /* bletcherous hack to pound bits to unwedge card*/ + ifp->if_flags ^= IFF_PROMISC; /* toggle it*/ + fxp_init(sc); + ifp->if_flags ^= IFF_PROMISC; /* toggle it back*/ fxp_init(sc); } ---------------------------------------------------------------------------- The real pain with this is that (1) it assumes the watchdog is being called (ie: transmits are actually failing), (2) It's take a "long" time... maybe enough for you to feel the hiccup (but nothing you can do about that if the chip is broken), and (3) you're going to get some packets which aren't meant for you in a tiny (but annoying) window. A better fix would call the watchdog when you've been waiting a "long time" for an ACK, so it wouldn't depend on the transmit not functioning; for all I know, it is. 8-( Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709261131.EAA19446>