Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Mar 2009 00:09:47 +0100
From:      Hans Petter Selasky <hselasky@c2i.net>
To:        Steve Calfee <stevecalfee@gmail.com>
Cc:        freebsd-usb@freebsd.org
Subject:   Re: Low perfomance when read from usb flash drive
Message-ID:  <200903050009.48188.hselasky@c2i.net>
In-Reply-To: <4a5ff6bc0903041437l52a58387v1735e34ebc383847@mail.gmail.com>
References:  <200903010045.44904.man@email.com.ua> <200903042311.00403.hselasky@c2i.net> <4a5ff6bc0903041437l52a58387v1735e34ebc383847@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 04 March 2009, Steve Calfee wrote:
> On Wed, Mar 4, 2009 at 2:10 PM, Hans Petter Selasky <hselasky@c2i.net> 
wrote:
> > On Wednesday 04 March 2009, Artyom Mirgorodsky wrote:
> >> I forgot to write, that a similar problem was observed in FreeBSD 7 with
> >> usb4bsd patches.
> >
> > Here is a patch which I think will address your problem. It is EHCI
> > hardware related. Different models behave differently. Try this:
> >
> > http://perforce.freebsd.org/chv.cgi?CH=158692
>
> Wow, that is bizarre. The "doorbell" is usually used so the software
> knows that the cpu vs dma race is complete when moving qtds off the qh
> - ie dequeueing requests. What you have done is asked the guy doing
> the photo finish at the end of a horse race to press the shutter a
> little before the horse crosses the line - so the shutter latency will
> allow a picture right as the horse crosses the line. Unfortunately,
> this (like all races) is affected by the object doing the racing -
> horse speed and camera speed are variable!
>
> I believe something else must be wrong.

The new USB stack _is_ doing things faster than the old one. I have gone 
through a large range of tests before I landed on the doorbell trick. 
Actually the new USB stack doesn't use the doorbell, we use software timers 
instead. The hardware certainly behaves different from vendor to vendor. 
Probably somone making the chips have to explain what is wrong.

In my (Nvidia+AMD64+EHCI) test case I tried:

No doorbell: 12 Mbyte/sec
Doorbell after transfer (after QH removal): 14 Mbyte/sec
Doorbell after next transfer (after QH insertion): 21 Mbyte/sec

On my other reference Intel chip I did not get any noticable performance 
increase by doing this trick ...

I think this trick is the best I can do. I've seen peculiarities before 
regarding the doorbell not behaving like it should ....

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200903050009.48188.hselasky>