Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 04 Nov 2006 08:24:55 -0800
From:      Sam Leffler <sam@errno.com>
To:        pyunyh@gmail.com
Cc:        hackers@freebsd.org, "Devon H. O'Dell" <devon.odell@gmail.com>
Subject:   Re: vr(4) performance
Message-ID:  <454CBED7.2000103@errno.com>
In-Reply-To: <20061103003311.GD69214@cdnetworks.co.kr>
References:  <9ab217670611021511l3120d58bhd0b61bf44f8ecc87@mail.gmail.com> <454A7EF2.5090201@errno.com> <20061103003311.GD69214@cdnetworks.co.kr>

next in thread | previous in thread | raw e-mail | index | archive | help
Pyun YongHyeon wrote:
> On Thu, Nov 02, 2006 at 03:27:46PM -0800, Sam Leffler wrote:
>  > Devon H. O'Dell wrote:
>  > > Hey all,
>  > > 
>  > > So, vr(4) kind of sucks, and it seems like this is mostly due to the
>  > > fact that we call m_defrag() on every mbuf that we send through it.
>  > > This seems to really screw performance on outgoing packets (something
>  > > like 33% the output efficiency of fxp(4), if I'm understanding this
>  > > all correctly).
>  > > 
>  > > I'm sort of wondering if anybody has attempted to address this before
>  > > and if there's a way to possibly mitigate this behavior. I know Bill
>  > > Paul's comments say ``Unfortunately, FreeBSD FreeBSD doesn't guarantee
>  > > that mbufs will be filled in starting at longword boundaries, so we
>  > > have to do a buffer copy before transmission.'' -- since it's been a
>  > > long day, and I'm about to go home to grab a pizza and stop thinking
>  > > about code, would anybody mind offering suggestions as to either:
>  > > 
>  > > a) Pros and cons of guaranteeing that they're filled in aligned (and
>  > > possibly hints on doing it), or
>  > > b) Possible workarounds / hacks to do this faster for vr(4)
>  > > 
>  > > Any input is appreciated! (Except ``vr(4) is lol'')
>  > 
>  > m_defrag is ~10x slower than it needs to be.  I proposed changes to
>  > address this a while back but eventually gave up and put driver-specific
>  > code in ath.  You can look there or I can send you some patches to
>  > m_defrag to try out in vr.
>  > 
> 
> Because the purpose of m_defrag(9) in vr(4) is to guarantee longword
> aligned mbufs I'm not sure ath_defrag can be used here. If memory
> serve me right ath_defrag would not change the first mbuf address
> in a chain. If the first mbuf is not aligned on longword boundary
> it wouldn't work I guess. Of course we can check the first mbuf in
> the chain before calling super-fast ath_defrag, I guess.
> 

m_defrag is used for two purposes (mainly) in the system: reducing the
mbuf count in a chain so that an outbound packet fits in a limited
number of h/w tx descriptors and aligning packet data for cards with
constrained dma engines.  Both these operations belong in bus_dma.
Combining both these operations in a single routine results in overly
pessimistic code for the common case.  Separately the algorithm in
m_defrag is suboptimal (e.g. it makes a complete copy even when a packet
needs no changes).

ath_defrag is example code tailored to the ath driver that handles only
the mbuf chain too long issue.  I have other code that can do packet
alignment and/or both alignment+mbuf coalescing far better than the
current logic in m_defrag.

The right solution to this problem--as suggested by John Baldwin and
Scott Long is to improve the bus_dma code so these things happen
automatically for the driver according to the dma tag config.  This
would eliminate the need for m_defrag in all cases I'm aware of.  Since
bus_dma has info like the max # segments a device can accept and any
alignment constraints it can do a much more efficient job.

	Sam



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?454CBED7.2000103>