Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jan 2012 10:05:10 -0800
From:      Jack Vogel <jfvogel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Ed Maste <emaste@freebsd.org>, Navdeep Parhar <np@freebsd.org>, net@freebsd.org
Subject:   Re: Latency issues with buf_ring
Message-ID:  <CAFOYbc=-xzw9DDbUKgDCwT7qmMmC9x=2F5qt0pCjCMiT5XDUcQ@mail.gmail.com>
In-Reply-To: <201201191141.25998.jhb@freebsd.org>
References:  <201201191141.25998.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Interesting that you bring this up, I have just recently found that UDP TX
stressing
in my igb driver suffers when using the mq interface, using the old
interface its
much better, I've not been real happy about just reverting, my interim
solution has
been to make a compile option to the driver, but a REAL fix would be much
appreciated!!

Cheers,

Jack


On Thu, Jan 19, 2012 at 8:41 AM, John Baldwin <jhb@freebsd.org> wrote:

> The current buf_ring usage in various NIC drivers has a race that can
> result in really high packet latencies in some cases.  Specifically,
> the common pattern in an if_transmit routine is to use a try-lock on
> the queue and if that fails enqueue the packet in the buf_ring and
> return.  The race, of course, is that the thread holding the lock
> might have just finished checking the buf_ring and found it empty and
> be in the process of releasing the lock when the original thread fails
> the try lock.  If this happens, then the packet queued by the first
> thread will be stalled until another thread tries to transmit packets
> for that queue.  Some drivers attempt to handle this race (igb(4)
> schedules a task to kick the transmit queue if the try lock fails) and
> others don't (cxgb(4) doesn't handle it at all).  At work this race
> was triggered very often after upgrading from 7 to 8 with bursty
> traffic and caused numerous problems, so it is not a rare occurrence
> and needs to be addressed.
>
> (Note, all patches mentioned are against 8)
>
> The first hack I tried to use was to simply always lock the queue after
> the drbr enqueue if the try lock failed and then drain the queue if
> needed (www.freebsd.org/~jhb/patches/try_fail.patch<http://www.freebsd.org/%7Ejhb/patches/try_fail.patch>).
>  While this fixed
> my latency problems, it would seem that this breaks other workloads
> that the drbr design is trying to optimize.
>
> After further hacking what I came up with was a variant of drbr_enqueue()
> that would atomically set a 'pending' flag.  During the enqueue operation.
> The first thread to fail the try lock sets this flag (it is told that it
> set the flag by a new return value (EINPROGRESS) from the enqueue call).
> The pending thread then explicitly clears the flag once it acquires the
> queue lock.  This should prevent multiple threads from stacking up on the
> queue lock so that if multiple threads are dumping packets into the ring
> concurrently all but two (the one draining the queue currently and the
> one waiting for the lock) can continue to drain the queue.  One downside
> of this approach though is that each driver has to be changed to make
> an explicit call to clear the pending flag after grabbing the queue lock
> if the try lock fails.  This is what I am currently running in production
> (www.freebsd.org/~jhb/patches/try_fail3.patch<http://www.freebsd.org/%7Ejhb/patches/try_fail3.patch>;
> ).
>
> However, this still results in a lot of duplicated code in each driver
> that wants to support multiq.  Several folks have expressed a desire
> to move in a direction where the stack has explicit knowledge of
> transmit queues allowing us to hoist some of this duplicated code out
> of the drivers and up into the calling layer.  After discussing this a
> bit with Navdeep (np@), the approach I am looking at is to alter the
> buf_ring code flow a bit to more closely model the older code-flow
> with IFQ and if_start methods.  That is, have the if_transmit methods
> always enqueue each packet that arrives to the buf_ring and then to
> call an if_start-like method that drains a specific transmit queue.
> This approach simplifies a fair bit of driver code and means we can
> potentially move the enqueue, etc. bits up into the calling layer and
> instead have drivers provide the per-transmit queue start routine as
> the direct function pointer to the upper layers ala if_start.
>
> However, we would still need a way to close the latency race.  I've
> attempted to do that by inverting my previous 'thread pending' flag.
> Instead, I make the buf_ring store a 'busy' flag.  This flag is
> managed by the single-consumer buf_ring dequeue method (that
> drbr_dequeue() uses).  It is set to true when a packet is removed from
> the queue while there are more packets pending.  Conversely, if there
> are no other pending packets then it is set to false.  The assumption
> is that once a thread starts draining the queue, it will not stop
> until the queue is empty (or if it has to stop for some other reason
> such as the transmit ring being full, the driver will restart draining
> of the queue until it is empty, e.g. after it receives a transmit
> completion interrupt).  Now when the if_transmit routine enqueues the
> packet, it will get either a real error, 0 if the packet was enqueued
> and the queue was not idle, or EINPROGRESS if the packet was enqueued
> and the queue was busy.  For the EINPROGRESS case the if_transmit
> routine just returns success.  For the 0 case it does a blocking lock
> on the queue lock and calls the queue's start routine (note that this
> means that the busy flag is similar to the old OACTIVE interface
> flag).  This does mean that in some cases you may have one thread that
> is sending what was the last packet in the buf_ring holding the lock
> when another thread blocks, and that the first thread will see the new
> packet when it loops back around so that the second thread is wasting
> it's time spinning, but in the common case I believe it will give the
> same parallelism as the current code.  OTOH, there is nothing to
> prevent multiple threads from "stacking up" in the new approach.  At
> least the try_fail3 patch ensured only one thread at a time would ever
> potentially block on the queue lock.
>
> Another approach might be to replace the 'busy' flag with the 'thread
> pending' flag from try_fail3.patch, but to clear the 'thread pending'
> flag anytime the dequeue method is called rather than using an
> explicit 'clear pending' method.  (Hadn't thought of that until
> writing this e-mail.)  That would prevent multiple threads from
> waiting on the queue lock perhaps.
>
> Note that the 'busy' approach (or the modification I mentioned above)
> does rely on the assumption I stated above, i.e. once a driver starts
> draining a queue, it will drain it until empty unless it hits an
> "error" condition (link went down, transmit ring full, etc.).  If it
> hits an "error" condition, the driver is responsible for restarting
> transmit when the condition clears.  I believe our drivers already
> work this way now.
>
> The 'busy' patch is at http://www.freebsd.org/~jhb/patches/drbr.patch
>
> --
> John Baldwin
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFOYbc=-xzw9DDbUKgDCwT7qmMmC9x=2F5qt0pCjCMiT5XDUcQ>