From owner-freebsd-arch@FreeBSD.ORG  Wed Nov 29 21:51:19 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D7F0616A519
	for <freebsd-arch@freebsd.org>; Wed, 29 Nov 2006 21:51:19 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A777E43DAC
	for <freebsd-arch@freebsd.org>; Wed, 29 Nov 2006 21:47:48 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.7/8.13.4) with ESMTP id kATLlqVd048224;
	Wed, 29 Nov 2006 13:47:52 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.7/8.13.4/Submit) id kATLll4m048223;
	Wed, 29 Nov 2006 13:47:47 -0800 (PST)
Date: Wed, 29 Nov 2006 13:47:47 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200611292147.kATLll4m048223@apollo.backplane.com>
To: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
References: <11392.1164835409@critter.freebsd.dk>
Cc: Ricardo Nabinger Sanchez <rnsanchez@wait4.org>, freebsd-arch@freebsd.org
Subject: Re: a proposed callout API 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Nov 2006 21:51:20 -0000


:
:In message <200611292115.kATLFlxd047970@apollo.backplane.com>, Matthew Dillon w
:rites:
:
:>:Your input has been noted to the extent it is relevant.
:>
:>    Now now Poul, if you don't have anything nice to say.... try not to act
:>    like a stuck up pig.  Oops!  Did I say something bad?
:
:My qualification was only a reflection on the fact that you obviously
:had not read the first part of the tread and therefore did not seem to
:take into account the changes proposed initially.
:
:-- 
:Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
:

    The difference between you and me, Poul, is that you always try to play
    cute tricks with words when you intend to insult someone.  Me? I just
    go ahead and insult them explicitly.

    In anycase, I think the relevance of my comments is clear to anyone who
    has followed the project.  Are you guys so stuck up on performance
    that you are willing to seriously pollute your APIs just to get rid of
    a few multiplications and divisions?

    I mean, come on... the callout code is about as close to optimal as it
    is possible to *GET*.  If performance is an issue, it isn't the callout
    algorithm that's the problem, its all the pollution that has been added
    to it to make it cpu-agnostic.

    You don't have to agree with me, but I think the relevance of my remarks
    is pretty clear.  The FreeBSD source already has very serious mutex
    visibility pollution all throughout the codebase, and now you want to
    expose your already crazy multi-variable timer ABI to higher levels
    as well?  Hell, people are still reporting calcru warnings and panics
    and problems after years!  Maybe you should consider fixing those once
    and for all first.

    If you insist, I'll address your original points one at a time:

:1. We need better resolution than a periodic "hz" clock can give us.
:   Highspeed networking, gaming servers and other real-time apps want
:   this.
:
:2. We "pollute" our call-wheel with tons of callouts that we know are
:   unlikely to happen.

    The callout algorithm was designed to make this 'pollution' optimal.
    And it is optimal both from the point of view of the callwheel design and
    from the point of view of cache locality of reference.  The problem
    isn't the callwheel, it's the fact that all this additional mutex junk
    has been wrapped around the code to make it cpu-agnostic and MP-safe,
    requiring the callout code to dip into its mutex protected portions
    multiple times to execute a single operation (aka callout callback, then
    callout_reset()).

    There are performance problems here, but it's with the wrappers around
    the callout code, not with the code itself.

:3. We have many operations on the callout wheel because certain
:   callouts gets rearmed for later in the future.  (TCP keepalives).
:
:4. We execute all callouts on one CPU only.

    Well, interesting... that's aweful.  Maybe, say, a PER-CPU callout
    design would solve that little problem?  Sounds like it would kill
    two birds with one stone, especially if you are still deep-stacking
    your TCP protocol stacks from the interface interrupt.  

    If you are going to associate interrupts with cpu's, then all related
    protocol operations could also be associated with those same cpu's,
    in PARTICULAR the callout operations.  That would automatically give
    you a critical-section interlock and you wouldn't have to use mutexes
    to interlock the callout and the TCP stack.

:5. Most of the specified timeouts are bogus, because of the imprecision
:   inheret in the current 1/hz method of scheduling them.

    If you are talking about TCP, this simply is not the case.  In a LAN
    environment trying to apply timeouts less then a few milliseconds
    to a TCP protocol stack is just asking for it.  Nobody gives a rats
    ass about packet loss in sub-millisecond TCP connections because it is
    NOT POSSIBLE to have optimal throughput EVEN IF you use fine-grained 
    timers in any such environment where packet loss occurs.  A LAN
    environment that loses packets in such a situation is broken and needs
    to be fixed.  In WAN environments, where transit times are greater then
    a few milliseconds, having a fairly course-grained timeout for the 
    TCP protocol stack is just not an issue.  It really isn't.

    I'm wondering whether you are trying to fix issues in bogus contrived
    protocol tests or whether you are trying to fix issues in the real world
    here. 

    There's a reason why GigE has hardware flow control.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>