Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Jan 2015 09:33:04 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        svn-src-projects@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r277122 - projects/ifnet/sys/dev/msk
Message-ID:  <54B67E20.3090701@FreeBSD.org>
In-Reply-To: <20150113231735.GZ15484@FreeBSD.org>
References:  <201501130902.t0D927NE077024@svn.freebsd.org> <5330876.Sb1U9Iz8Cz@ralph.baldwin.cx> <20150113231735.GZ15484@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/13/15 6:17 PM, Gleb Smirnoff wrote:
>   John,
> 
> On Tue, Jan 13, 2015 at 01:22:09PM -0500, John Baldwin wrote:
> J> >   o Convert from xxx_start(ifp) to xxx_transmit(ifp, m).
> J> >     A simple conversion itself is quite straight:
> J> >     * In ifdriver declaration define .ifdrv_maxqlen, take the value
> J> >       from IFQ_SET_MAXLEN() macro.
> J> >     * In ifdriver ifops declaration define if_transmit function.
> J> >     * Rename xxx_start() to xxx_transmit() and change its prototype.
> J> >     * The new named xxx_transmit() should:
> J> >       - Try to if_snd_enqueue() the mbuf or return.
> J> >       - Try to mtx_lock() the driver softc or return.
> J> 
> J> Please no.  This is a major source of pain with the current drivers that use 
> J> buf_ring that is worked around in various (often broken) ways.  The problem
> J> is that the other thread holding the lock might drop it right after your try 
> J> lock fails and then the packet you just queued doesn't go out until you queue 
> J> another packet.  That's all fine and good if you are blasting packets out an 
> J> interface.  It is not fine and good if you have sparse traffic.  That packet 
> J> is now potentially delayed indefinitely.  (In a real world scenario with a 
> J> heartbeat protocol that sent out timing packets once a second or so to measure 
> J> RTT between hosts this manifested as odd timings because we would see the 
> J> timings jump by a second every so often when the packet was delayed until the 
> J> next heartbeat).
> 
> Yes, I also see this problem when coding that.
> 
> J> This kind of design decision is why I do not want to just blindly convert 
> J> everything to suboptimal if_transmit routines.  The current ifq based if_start 
> J> model is better for 10/100 single-queue drivers than this.
> 
> If I convert mtx_trylock() to mtx_lock(), then we will get 1:1 conversion
> to the current model. But, I still want to do an improvement while converting
> drivers. :)

The trylock isn't an improvement.  I also worry that this approach is
going to result in a lot of copy and pasted code in various drivers as
they all reimplement the current if_transmit() (effectively).

> May be the xxx_transmit() should check the queue again after softc unlock?

I posted some ideas about how to handle this in a thread several years
ago on net@ with various alternatives.  In that case I was focused on
buf_ring and I settled on an approach where a draining thread marked the
queue as "busy" while it was draining it and cleared that flag before
checking the head of the queue.  The enqueue process returned a
different errno value (EINPROGRESS or some such) if it queued a packet
into a "busy" queue and the transmit routines were changed to 1) always
enqueue the packet, and 2) if EINPROGRESS wasn't returned, use a
blocking mtx_lock and start transmitting.

However, even this model has some downsides in that one thread might be
stuck transmitting packets queued by other threads and never pop back
out to userland to service its associated application (a kind of
starvation of the user side of the thread).  Note that the mtx_trylock
approach has the same problem.  It might be nice to have a sort of limit
on the number of packets a thread is willing to enqueue, but then you
have the problem of ensuring any packets still on the queue when it hits
its limit aren't also delayed indefinitely.

I don't recall the exact mechanics of how Navdeep's mp_ring addresses
this (though I believe it does from when I looked at it).

Regardless, I think this was my point I attempted to make on IRC
earlier: you need to figure out what you are going to do here first
before you go through and convert all the drivers.  Otherwise you will
be stuck making multiple passes.  Converting a "real" driver up front is
useful so you can prototype different solutions, but I think you need to
resolve this now before continuing your pass as the current approach is
not suitable to be merged into HEAD.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B67E20.3090701>