From owner-freebsd-net@FreeBSD.ORG Fri Apr 19 19:40:01 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 63389639 for ; Fri, 19 Apr 2013 19:40:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 531FD1A7F for ; Fri, 19 Apr 2013 19:40:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3JJe1dI005502 for ; Fri, 19 Apr 2013 19:40:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3JJe1uH005501; Fri, 19 Apr 2013 19:40:01 GMT (envelope-from gnats) Date: Fri, 19 Apr 2013 19:40:01 GMT Message-Id: <201304191940.r3JJe1uH005501@freefall.freebsd.org> To: freebsd-net@FreeBSD.org Cc: From: Jack Vogel Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Jack Vogel List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2013 19:40:01 -0000 The following reply was made to PR kern/176446; it has been noted by GNATS. From: Jack Vogel To: John Baldwin Cc: FreeBSD Net , bug-followup@freebsd.org, Mike Karels Subject: Re: kern/176446: [netinet] [patch] Concurrency in ixgbe driving out-of-order packet process and spurious RST Date: Fri, 19 Apr 2013 12:32:59 -0700 --14dae9cdc48767db0904dabbc98d Content-Type: text/plain; charset=ISO-8859-1 Thanks John, I'm incorporating your changes into my source tree. I also plan on changing the "glue" between mq_start and mq_start_locked on igb after some UDP testing that was done, and believe ixgbe should follow suit. Results there have shown the latency is just too high if I only use the task_enqueue... What works best is to always queue to the buf ring, but then also always to do the TRY_LOCK. I will update HEAD as soon as I handle an internal firedrill I have today :) Jack On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin wrote: > A second patch. This is not something I mentioned before, but I had this > in > my checkout. In the legacy IRQ case this could also result in out-of-order > processing. It also fixes a potential OACTIVE-stuck type bug that we used > to > have in igb. I have no way to test this, so it would be good if some other > folks could test this. > > The patch changes ixgbe_txeof() return void and changes the few places that > checked its return value to ignore it. While it is true that ixgbe has a > tx > processing limit (which I think is dubious.. TX completion processing is > very > cheap unlike RX processing, so it seems to me like it should always run to > completion as in igb), in the common case I think the result will be to do > what igb used to do: poll the ring at 100% CPU (either in the interrupt > handler or in the task it keeps rescheduling) waiting for pending TX > packets > to be completed (which is pointless: the host CPU can't make the NIC > transmit > packets any faster by polling). > > It also changes the interrupt handlers to restart packet transmission > synchronously rather than always deferring that to a task (the former is > what > (nearly) all other drivers do). It also fixes the interrupt handlers to be > consistent (one looped on txeof but not the others). In the case of the > legacy interrupt handler it is possible it could fail to restart packet > transmission if there were no pending RX packets after rxeof returned and > txeof fully cleaned its ring without this change. > > It also fixes the legacy interrupt handler to not re-enable the interrupt > if > it schedules the task but to wait until the task completes (this could > result > in concurrent, out-of-order RX processing). > > Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c > =================================================================== > --- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (revision > 249553) > +++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c (working > copy) > @@ -149,7 +149,7 @@ > static void ixgbe_enable_intr(struct adapter *); > static void ixgbe_disable_intr(struct adapter *); > static void ixgbe_update_stats_counters(struct adapter *); > -static bool ixgbe_txeof(struct tx_ring *); > +static void ixgbe_txeof(struct tx_ring *); > static bool ixgbe_rxeof(struct ix_queue *); > static void ixgbe_rx_checksum(u32, struct mbuf *, u32); > static void ixgbe_set_promisc(struct adapter *); > @@ -1431,7 +1414,10 @@ > } > > /* Reenable this interrupt */ > - ixgbe_enable_queue(adapter, que->msix); > + if (que->res != NULL) > + ixgbe_enable_queue(adapter, que->msix); > + else > + ixgbe_enable_intr(adapter); > return; > } > > @@ -1449,8 +1435,9 @@ > struct adapter *adapter = que->adapter; > struct ixgbe_hw *hw = &adapter->hw; > struct tx_ring *txr = adapter->tx_rings; > - bool more_tx, more_rx; > - u32 reg_eicr, loop = MAX_LOOP; > + struct ifnet *ifp = adapter->ifp; > + bool more; > + u32 reg_eicr; > > > reg_eicr = IXGBE_READ_REG(hw, IXGBE_EICR); > @@ -1461,17 +1448,19 @@ > return; > } > > - more_rx = ixgbe_rxeof(que); > + more = ixgbe_rxeof(que); > > IXGBE_TX_LOCK(txr); > - do { > - more_tx = ixgbe_txeof(txr); > - } while (loop-- && more_tx); > + ixgbe_txeof(txr); > +#if __FreeBSD_version >= 800000 > + if (!drbr_empty(ifp, txr->br)) > + ixgbe_mq_start_locked(ifp, txr, NULL); > +#else > + if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) > + ixgbe_start_locked(txr, ifp); > +#endif > IXGBE_TX_UNLOCK(txr); > > - if (more_rx || more_tx) > - taskqueue_enqueue(que->tq, &que->que_task); > - > /* Check for fan failure */ > if ((hw->phy.media_type == ixgbe_media_type_copper) && > (reg_eicr & IXGBE_EICR_GPI_SDP1)) { > @@ -1484,7 +1473,10 @@ > if (reg_eicr & IXGBE_EICR_LSC) > taskqueue_enqueue(adapter->tq, &adapter->link_task); > > - ixgbe_enable_intr(adapter); > + if (more) > + taskqueue_enqueue(que->tq, &que->que_task); > + else > + ixgbe_enable_intr(adapter); > return; > } > > @@ -1501,27 +1493,24 @@ > struct adapter *adapter = que->adapter; > struct tx_ring *txr = que->txr; > struct rx_ring *rxr = que->rxr; > - bool more_tx, more_rx; > + struct ifnet *ifp = adapter->ifp; > + bool more; > u32 newitr = 0; > > ixgbe_disable_queue(adapter, que->msix); > ++que->irqs; > > - more_rx = ixgbe_rxeof(que); > + more = ixgbe_rxeof(que); > > IXGBE_TX_LOCK(txr); > - more_tx = ixgbe_txeof(txr); > - /* > - ** Make certain that if the stack > - ** has anything queued the task gets > - ** scheduled to handle it. > - */ > + ixgbe_txeof(txr); > #ifdef IXGBE_LEGACY_TX > if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd)) > + ixgbe_start_locked(txr, ifp); > #else > - if (!drbr_empty(adapter->ifp, txr->br)) > + if (!drbr_empty(ifp, txr->br)) > + ixgbe_mq_start_locked(ifp, txr, NULL); > #endif > - more_tx = 1; > IXGBE_TX_UNLOCK(txr); > > /* Do AIM now? */ > @@ -1575,7 +1564,7 @@ > rxr->packets = 0; > > no_calc: > - if (more_tx || more_rx) > + if (more) > taskqueue_enqueue(que->tq, &que->que_task); > else /* Reenable this interrupt */ > ixgbe_enable_queue(adapter, que->msix); > @@ -3557,7 +3545,7 @@ > * tx_buffer is put back on the free queue. > * > **********************************************************************/ > -static bool > +static void > ixgbe_txeof(struct tx_ring *txr) > { > struct adapter *adapter = txr->adapter; > @@ -3605,13 +3593,13 @@ > IXGBE_CORE_UNLOCK(adapter); > IXGBE_TX_LOCK(txr); > } > - return FALSE; > + return; > } > #endif /* DEV_NETMAP */ > > if (txr->tx_avail == txr->num_desc) { > txr->queue_status = IXGBE_QUEUE_IDLE; > - return FALSE; > + return; > } > > /* Get work starting point */ > @@ -3705,12 +3693,8 @@ > if ((!processed) && ((ticks - txr->watchdog_time) > > IXGBE_WATCHDOG)) > txr->queue_status = IXGBE_QUEUE_HUNG; > > - if (txr->tx_avail == txr->num_desc) { > + if (txr->tx_avail == txr->num_desc) > txr->queue_status = IXGBE_QUEUE_IDLE; > - return (FALSE); > - } > - > - return TRUE; > } > > /********************************************************************* > > > -- > John Baldwin > --14dae9cdc48767db0904dabbc98d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks John, I'm incorporating you= r changes into my source tree. I also plan on changing the
"g= lue" between mq_start and mq_start_locked on igb after some UDP testin= g that was done, and
believe ixgbe should follow suit. Results there have shown the latenc= y is just too high if I only use
the task_enqueue... What works best is = to always queue to the buf ring, but then also always to
do the TR= Y_LOCK. I will update HEAD as soon as I handle an internal firedrill I have= today :)

Jack



On Fri, Apr 19, 2013 at 9:27 AM, John Baldwin <jhb@freebsd.= org> wrote:
A second patch. =A0This is not something I m= entioned before, but I had this in
my checkout. =A0In the legacy IRQ case this could also result in out-of-ord= er
processing. =A0It also fixes a potential OACTIVE-stuck type bug that we use= d to
have in igb. =A0I have no way to test this, so it would be good if some oth= er
folks could test this.

The patch changes ixgbe_txeof() return void and changes the few places that=
checked its return value to ignore it. =A0While it is true that ixgbe has a= tx
processing limit (which I think is dubious.. TX completion processing is ve= ry
cheap unlike RX processing, so it seems to me like it should always run to<= br> completion as in igb), in the common case I think the result will be to do<= br> what igb used to do: poll the ring at 100% CPU (either in the interrupt
handler or in the task it keeps rescheduling) waiting for pending TX packet= s
to be completed (which is pointless: the host CPU can't make the NIC tr= ansmit
packets any faster by polling).

It also changes the interrupt handlers to restart packet transmission
synchronously rather than always deferring that to a task (the former is wh= at
(nearly) all other drivers do). =A0It also fixes the interrupt handlers to = be
consistent (one looped on txeof but not the others). =A0In the case of the<= br> legacy interrupt handler it is possible it could fail to restart packet
transmission if there were no pending RX packets after rxeof returned and txeof fully cleaned its ring without this change.

It also fixes the legacy interrupt handler to not re-enable the interrupt i= f
it schedules the task but to wait until the task completes (this could resu= lt
in concurrent, out-of-order RX processing).

Index: /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (revi= sion 249553)
+++ /home/jhb/work/freebsd/svn/head/sys/dev/ixgbe/ixgbe.c =A0 =A0 =A0 (work= ing copy)
@@ -149,7 +149,7 @@
=A0static void =A0 =A0 ixgbe_enable_intr(struct adapter *);
=A0static void =A0 =A0 ixgbe_disable_intr(struct adapter *);
=A0static void =A0 =A0 ixgbe_update_stats_counters(struct adapter *);
-static bool =A0 =A0ixgbe_txeof(struct tx_ring *);
+static void =A0 =A0ixgbe_txeof(struct tx_ring *);
=A0static bool =A0 =A0ixgbe_rxeof(struct ix_queue *);
=A0static void =A0 =A0ixgbe_rx_checksum(u32, struct mbuf *, u32);
=A0static void =A0 =A0 ixgbe_set_promisc(struct adapter *);
@@ -1431,7 +1414,10 @@
=A0 =A0 =A0 =A0 }

=A0 =A0 =A0 =A0 /* Reenable this interrupt */
- =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);
+ =A0 =A0 =A0 if (que->res !=3D NULL)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix); + =A0 =A0 =A0 else
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);
=A0 =A0 =A0 =A0 return;
=A0}

@@ -1449,8 +1435,9 @@
=A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;
=A0 =A0 =A0 =A0 struct ixgbe_hw *hw =3D &adapter->hw;
=A0 =A0 =A0 =A0 struct =A0 =A0 =A0 =A0 =A0tx_ring *txr =3D adapter->tx_r= ings;
- =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;
- =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr, loop =3D MAX_LOOP;
+ =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;
+ =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;
+ =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 reg_eicr;


=A0 =A0 =A0 =A0 reg_eicr =3D IXGBE_READ_REG(hw, IXGBE_EICR);
@@ -1461,17 +1448,19 @@
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
=A0 =A0 =A0 =A0 }

- =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);
+ =A0 =A0 =A0 more =3D ixgbe_rxeof(que);

=A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);
- =A0 =A0 =A0 do {
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);
- =A0 =A0 =A0 } while (loop-- && more_tx);
+ =A0 =A0 =A0 ixgbe_txeof(txr);
+#if __FreeBSD_version >=3D 800000
+ =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);
+#else
+ =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);
+#endif
=A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);

- =A0 =A0 =A0 if (more_rx || more_tx)
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu= e_task);
-
=A0 =A0 =A0 =A0 /* Check for fan failure */
=A0 =A0 =A0 =A0 if ((hw->phy.media_type =3D=3D ixgbe_media_type_copper) = &&
=A0 =A0 =A0 =A0 =A0 =A0 (reg_eicr & IXGBE_EICR_GPI_SDP1)) {
@@ -1484,7 +1473,10 @@
=A0 =A0 =A0 =A0 if (reg_eicr & IXGBE_EICR_LSC)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(adapter->tq, &adap= ter->link_task);

- =A0 =A0 =A0 ixgbe_enable_intr(adapter);
+ =A0 =A0 =A0 if (more)
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->qu= e_task);
+ =A0 =A0 =A0 else
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_intr(adapter);
=A0 =A0 =A0 =A0 return;
=A0}

@@ -1501,27 +1493,24 @@
=A0 =A0 =A0 =A0 struct adapter =A0*adapter =3D que->adapter;
=A0 =A0 =A0 =A0 struct tx_ring =A0*txr =3D que->txr;
=A0 =A0 =A0 =A0 struct rx_ring =A0*rxr =3D que->rxr;
- =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more_tx, more_rx;
+ =A0 =A0 =A0 struct ifnet =A0 =A0*ifp =3D adapter->ifp;
+ =A0 =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0more;
=A0 =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 newitr =3D 0;

=A0 =A0 =A0 =A0 ixgbe_disable_queue(adapter, que->msix);
=A0 =A0 =A0 =A0 ++que->irqs;

- =A0 =A0 =A0 more_rx =3D ixgbe_rxeof(que);
+ =A0 =A0 =A0 more =3D ixgbe_rxeof(que);

=A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);
- =A0 =A0 =A0 more_tx =3D ixgbe_txeof(txr);
- =A0 =A0 =A0 /*
- =A0 =A0 =A0 ** Make certain that if the stack
- =A0 =A0 =A0 ** has anything queued the task gets
- =A0 =A0 =A0 ** scheduled to handle it.
- =A0 =A0 =A0 */
+ =A0 =A0 =A0 ixgbe_txeof(txr);
=A0#ifdef IXGBE_LEGACY_TX
=A0 =A0 =A0 =A0 if (!IFQ_DRV_IS_EMPTY(&adapter->ifp->if_snd))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_start_locked(txr, ifp);
=A0#else
- =A0 =A0 =A0 if (!drbr_empty(adapter->ifp, txr->br))
+ =A0 =A0 =A0 if (!drbr_empty(ifp, txr->br))
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_mq_start_locked(ifp, txr, NULL);
=A0#endif
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 more_tx =3D 1;
=A0 =A0 =A0 =A0 IXGBE_TX_UNLOCK(txr);

=A0 =A0 =A0 =A0 /* Do AIM now? */
@@ -1575,7 +1564,7 @@
=A0 =A0 =A0 =A0 =A0rxr->packets =3D 0;

=A0no_calc:
- =A0 =A0 =A0 if (more_tx || more_rx)
+ =A0 =A0 =A0 if (more)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 taskqueue_enqueue(que->tq, &que->= que_task);
=A0 =A0 =A0 =A0 else /* Reenable this interrupt */
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ixgbe_enable_queue(adapter, que->msix);<= br> @@ -3557,7 +3545,7 @@
=A0 * =A0tx_buffer is put back on the free queue.
=A0 *
=A0 **********************************************************************/=
-static bool
+static void
=A0ixgbe_txeof(struct tx_ring *txr)
=A0{
=A0 =A0 =A0 =A0 struct adapter =A0 =A0 =A0 =A0 =A0*adapter =3D txr->adap= ter;
@@ -3605,13 +3593,13 @@
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_CORE_UNLOCK(adapter);=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IXGBE_TX_LOCK(txr);
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
- =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
=A0 =A0 =A0 =A0 }
=A0#endif /* DEV_NETMAP */

=A0 =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<= br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return FALSE;
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
=A0 =A0 =A0 =A0 }

=A0 =A0 =A0 =A0 /* Get work starting point */
@@ -3705,12 +3693,8 @@
=A0 =A0 =A0 =A0 if ((!processed) && ((ticks - txr->watchdog_time= ) > IXGBE_WATCHDOG))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_HUNG;<= br>
- =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc) {
+ =A0 =A0 =A0 if (txr->tx_avail =3D=3D txr->num_desc)
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 txr->queue_status =3D IXGBE_QUEUE_IDLE;<= br> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (FALSE);
- =A0 =A0 =A0 }
-
- =A0 =A0 =A0 return TRUE;
=A0}

=A0/*********************************************************************

--
John Baldwin

--14dae9cdc48767db0904dabbc98d--