From owner-freebsd-arch@FreeBSD.ORG Wed Nov 29 21:51:19 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D7F0616A519 for ; Wed, 29 Nov 2006 21:51:19 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id A777E43DAC for ; Wed, 29 Nov 2006 21:47:48 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.7/8.13.4) with ESMTP id kATLlqVd048224; Wed, 29 Nov 2006 13:47:52 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.13.7/8.13.4/Submit) id kATLll4m048223; Wed, 29 Nov 2006 13:47:47 -0800 (PST) Date: Wed, 29 Nov 2006 13:47:47 -0800 (PST) From: Matthew Dillon Message-Id: <200611292147.kATLll4m048223@apollo.backplane.com> To: "Poul-Henning Kamp" References: <11392.1164835409@critter.freebsd.dk> Cc: Ricardo Nabinger Sanchez , freebsd-arch@freebsd.org Subject: Re: a proposed callout API X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Nov 2006 21:51:20 -0000 : :In message <200611292115.kATLFlxd047970@apollo.backplane.com>, Matthew Dillon w :rites: : :>:Your input has been noted to the extent it is relevant. :> :> Now now Poul, if you don't have anything nice to say.... try not to act :> like a stuck up pig. Oops! Did I say something bad? : :My qualification was only a reflection on the fact that you obviously :had not read the first part of the tread and therefore did not seem to :take into account the changes proposed initially. : :-- :Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 : The difference between you and me, Poul, is that you always try to play cute tricks with words when you intend to insult someone. Me? I just go ahead and insult them explicitly. In anycase, I think the relevance of my comments is clear to anyone who has followed the project. Are you guys so stuck up on performance that you are willing to seriously pollute your APIs just to get rid of a few multiplications and divisions? I mean, come on... the callout code is about as close to optimal as it is possible to *GET*. If performance is an issue, it isn't the callout algorithm that's the problem, its all the pollution that has been added to it to make it cpu-agnostic. You don't have to agree with me, but I think the relevance of my remarks is pretty clear. The FreeBSD source already has very serious mutex visibility pollution all throughout the codebase, and now you want to expose your already crazy multi-variable timer ABI to higher levels as well? Hell, people are still reporting calcru warnings and panics and problems after years! Maybe you should consider fixing those once and for all first. If you insist, I'll address your original points one at a time: :1. We need better resolution than a periodic "hz" clock can give us. : Highspeed networking, gaming servers and other real-time apps want : this. : :2. We "pollute" our call-wheel with tons of callouts that we know are : unlikely to happen. The callout algorithm was designed to make this 'pollution' optimal. And it is optimal both from the point of view of the callwheel design and from the point of view of cache locality of reference. The problem isn't the callwheel, it's the fact that all this additional mutex junk has been wrapped around the code to make it cpu-agnostic and MP-safe, requiring the callout code to dip into its mutex protected portions multiple times to execute a single operation (aka callout callback, then callout_reset()). There are performance problems here, but it's with the wrappers around the callout code, not with the code itself. :3. We have many operations on the callout wheel because certain : callouts gets rearmed for later in the future. (TCP keepalives). : :4. We execute all callouts on one CPU only. Well, interesting... that's aweful. Maybe, say, a PER-CPU callout design would solve that little problem? Sounds like it would kill two birds with one stone, especially if you are still deep-stacking your TCP protocol stacks from the interface interrupt. If you are going to associate interrupts with cpu's, then all related protocol operations could also be associated with those same cpu's, in PARTICULAR the callout operations. That would automatically give you a critical-section interlock and you wouldn't have to use mutexes to interlock the callout and the TCP stack. :5. Most of the specified timeouts are bogus, because of the imprecision : inheret in the current 1/hz method of scheduling them. If you are talking about TCP, this simply is not the case. In a LAN environment trying to apply timeouts less then a few milliseconds to a TCP protocol stack is just asking for it. Nobody gives a rats ass about packet loss in sub-millisecond TCP connections because it is NOT POSSIBLE to have optimal throughput EVEN IF you use fine-grained timers in any such environment where packet loss occurs. A LAN environment that loses packets in such a situation is broken and needs to be fixed. In WAN environments, where transit times are greater then a few milliseconds, having a fairly course-grained timeout for the TCP protocol stack is just not an issue. It really isn't. I'm wondering whether you are trying to fix issues in bogus contrived protocol tests or whether you are trying to fix issues in the real world here. There's a reason why GigE has hardware flow control. -Matt Matthew Dillon