Date: Wed, 29 Nov 2006 13:47:47 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: "Poul-Henning Kamp" <phk@phk.freebsd.dk> Cc: Ricardo Nabinger Sanchez <rnsanchez@wait4.org>, freebsd-arch@freebsd.org Subject: Re: a proposed callout API Message-ID: <200611292147.kATLll4m048223@apollo.backplane.com> References: <11392.1164835409@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
:
:In message <200611292115.kATLFlxd047970@apollo.backplane.com>, Matthew Dillon w
:rites:
:
:>:Your input has been noted to the extent it is relevant.
:>
:> Now now Poul, if you don't have anything nice to say.... try not to act
:> like a stuck up pig. Oops! Did I say something bad?
:
:My qualification was only a reflection on the fact that you obviously
:had not read the first part of the tread and therefore did not seem to
:take into account the changes proposed initially.
:
:--
:Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
:
The difference between you and me, Poul, is that you always try to play
cute tricks with words when you intend to insult someone. Me? I just
go ahead and insult them explicitly.
In anycase, I think the relevance of my comments is clear to anyone who
has followed the project. Are you guys so stuck up on performance
that you are willing to seriously pollute your APIs just to get rid of
a few multiplications and divisions?
I mean, come on... the callout code is about as close to optimal as it
is possible to *GET*. If performance is an issue, it isn't the callout
algorithm that's the problem, its all the pollution that has been added
to it to make it cpu-agnostic.
You don't have to agree with me, but I think the relevance of my remarks
is pretty clear. The FreeBSD source already has very serious mutex
visibility pollution all throughout the codebase, and now you want to
expose your already crazy multi-variable timer ABI to higher levels
as well? Hell, people are still reporting calcru warnings and panics
and problems after years! Maybe you should consider fixing those once
and for all first.
If you insist, I'll address your original points one at a time:
:1. We need better resolution than a periodic "hz" clock can give us.
: Highspeed networking, gaming servers and other real-time apps want
: this.
:
:2. We "pollute" our call-wheel with tons of callouts that we know are
: unlikely to happen.
The callout algorithm was designed to make this 'pollution' optimal.
And it is optimal both from the point of view of the callwheel design and
from the point of view of cache locality of reference. The problem
isn't the callwheel, it's the fact that all this additional mutex junk
has been wrapped around the code to make it cpu-agnostic and MP-safe,
requiring the callout code to dip into its mutex protected portions
multiple times to execute a single operation (aka callout callback, then
callout_reset()).
There are performance problems here, but it's with the wrappers around
the callout code, not with the code itself.
:3. We have many operations on the callout wheel because certain
: callouts gets rearmed for later in the future. (TCP keepalives).
:
:4. We execute all callouts on one CPU only.
Well, interesting... that's aweful. Maybe, say, a PER-CPU callout
design would solve that little problem? Sounds like it would kill
two birds with one stone, especially if you are still deep-stacking
your TCP protocol stacks from the interface interrupt.
If you are going to associate interrupts with cpu's, then all related
protocol operations could also be associated with those same cpu's,
in PARTICULAR the callout operations. That would automatically give
you a critical-section interlock and you wouldn't have to use mutexes
to interlock the callout and the TCP stack.
:5. Most of the specified timeouts are bogus, because of the imprecision
: inheret in the current 1/hz method of scheduling them.
If you are talking about TCP, this simply is not the case. In a LAN
environment trying to apply timeouts less then a few milliseconds
to a TCP protocol stack is just asking for it. Nobody gives a rats
ass about packet loss in sub-millisecond TCP connections because it is
NOT POSSIBLE to have optimal throughput EVEN IF you use fine-grained
timers in any such environment where packet loss occurs. A LAN
environment that loses packets in such a situation is broken and needs
to be fixed. In WAN environments, where transit times are greater then
a few milliseconds, having a fairly course-grained timeout for the
TCP protocol stack is just not an issue. It really isn't.
I'm wondering whether you are trying to fix issues in bogus contrived
protocol tests or whether you are trying to fix issues in the real world
here.
There's a reason why GigE has hardware flow control.
-Matt
Matthew Dillon
<dillon@backplane.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200611292147.kATLll4m048223>
