From owner-freebsd-net@FreeBSD.ORG Tue Feb 5 10:49:02 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 65EE156A; Tue, 5 Feb 2013 10:49:02 +0000 (UTC) (envelope-from randall@lakerest.net) Received: from lakerest.net (lakerest.net [70.155.160.98]) by mx1.freebsd.org (Postfix) with ESMTP id D066CD62; Tue, 5 Feb 2013 10:49:01 +0000 (UTC) Received: from [10.1.1.101] (bsd4.lakerest.net [70.155.160.102]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id r15AnGZb038037 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 5 Feb 2013 05:49:16 -0500 (EST) (envelope-from randall@lakerest.net) Subject: Re: Driver patch to look at... Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Randy Stewart In-Reply-To: <201302041524.58699.jhb@freebsd.org> Date: Tue, 5 Feb 2013 05:49:00 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <45AD1046-A630-4C96-B4D2-B8A7D6A6DC45@lakerest.net> References: <201302041524.58699.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1283) Cc: freebsd-net@freebsd.org, Robert Watson , Kip Macy , jv@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 10:49:02 -0000 On Feb 4, 2013, at 3:24 PM, John Baldwin wrote: > On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote: >> All: >>=20 >> I have been working with TCP in gigabit networks (igb driver = actually) and have >> found a very nasty problem with the way the driver is doing its put = back when >> it fills the out-bound transmit queue. >>=20 >> Basically it has taken a packet from the head of the ring buffer, and = then=20 >> realizes it can't fit it into the transmit queue. So it just = re-enqueue's it >> into the ring buffer. Whats wrong with that? Well most of the time = there >> are anywhere from 10-50 packets (maybe more) in that ring buffer when = you are >> operating at full speed (or trying to). This means you will see 10 = duplicate >> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice = actually. >>=20 >> The patch I have attached makes it so that >>=20 >> 1) There are ways to swap back. >> 2) Use the peek in the ring buffer and only >> dequeue the packet if we put it into the transmit ring >> 3) If something goes wrong and the transmit frees the packet we = dequeue it. >> 4) If the transmit changed it (defrag etc) then swap out the new mbuf = that >> has taken its place. >>=20 >> I have fixed the four intel drivers that had this systemic issue, but = there >> are still more to fix. >>=20 >> Comments/review .. rotten egg's etc.. would be most welcome before >> I commit this.. >=20 > Does this only happen in drivers that use buffering? Yep, there are a lot of drivers that *do not* use the drbr_xxxx() = functions and for those they do the IFQ_DRV_PREPEND().. its only the newer drivers = like the intel 1Gig and 10Gig ones that seem effected Also effected are : bxe cxgb oce en I have not fixed those yet. > I seem to recall that > drivers using IFQ would just stuff the packet at the head of the IFQ = via > IFQ_DRV_PREPEND() in this case so it is still the next packet to = transmit. > See, for example, this bit in dc_start_locked(): >=20 > for (queued =3D 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) { > /* > * If there's no way we can send any packets, return = now. > */ > if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - = DC_TX_LIST_RSVD) { > ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; > break; > } > IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); > if (m_head =3D=3D NULL) > break; >=20 > if (dc_encap(sc, &m_head)) { > if (m_head =3D=3D NULL) > break; > IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > ifp->if_drv_flags |=3D IFF_DRV_OACTIVE; > break; > } >=20 > It sounds like drbr/buf_ring just don't handle this case correctly? = It > seems a shame to have to duplicate so much code in the various drivers = to > fix this, but that seems to be par for the course when using buf_ring. = :( > (buggy in edge cases and lots of duplicated code that is). > Also, doing the drbr_swap() just so that drbr_dequeue() returns what = you > just swapped in seems... odd. It seems that it would be nicer instead > to have some sort of drbr_peek() / drbr_advance() where the latter = just > skips over whatever the current head is? Then you could have = something > like: >=20 > while ((next =3D drbr_peek()) !=3D NULL) { > if (!foo_encap(&next)) { > if (next =3D=3D NULL) > drbr_advance(); > break; > } > drbr_advance(); > } >=20 That was what I originally did (without the rename), but there is a sure = crash waiting in that. The only difference from what I originally had was just drbr_dequeue().. = but I was being a bit lazy and not wanting to add yet another function to = the=20 drbr_xxxx code since essential it would be a clone of drbr_dequeue() = without returning the mbuf. The crash potential here is in that foo_encap(&next) often times will = return a different mbuf (at least in the igb driver it does). I believe its due to either the m_pullup() which could change the lead mbuf you want in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have to track what comes back from the !foo_encap() call and compare it to=20 see if you must swap it out.=20 > I guess the sticky widget is the case of ATLQ as you need to dequeue = the > packet always in the ALTQ case and put it back if the encap fails. Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe re-think this whole layer ;-0 > For > your patch it's not clear to me how that works. It seems that if the > encap routine frees the mbuf you try to dereference a freed pointer = when > you call drbr_dequeue(). Hmm you are right.. I forgot how we keep those using the mbuf itself... > I really think you will instead need some sort > of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case = and > use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case. We could do that but drbr_putback() would probably need both the old and new pointers and then we could make it do the ring_swap() to put the right mbuf in place.. Let me go explore that and come up with a better patch ;-) R >=20 > --=20 > John Baldwin > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 ----- Randall Stewart randall@lakerest.net