From owner-freebsd-arch@FreeBSD.ORG Tue May 29 21:50:17 2007 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7A8D316A400; Tue, 29 May 2007 21:50:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 3908313C455; Tue, 29 May 2007 21:50:17 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net [71.231.138.78]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l4TLoAFn018017 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Tue, 29 May 2007 17:50:15 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 29 May 2007 14:50:05 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: John Baldwin In-Reply-To: <200705291737.25355.jhb@freebsd.org> Message-ID: <20070529144657.T661@10.0.0.1> References: <20070529105856.L661@10.0.0.1> <20070530065423.H93410@delplex.bde.org> <20070529141342.D661@10.0.0.1> <200705291737.25355.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: rusage breakdown and cpu limits. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 May 2007 21:50:17 -0000 On Tue, 29 May 2007, John Baldwin wrote: > On Tuesday 29 May 2007 05:18:32 pm Jeff Roberson wrote: >> On Wed, 30 May 2007, Bruce Evans wrote: >>> I see how rusage accumulation can help for everything _except_ the >>> runtime and tick counts (i.e., for stuff updated by statclock()). For >>> the runtime and tick counts, the possible savings seem to be small and >>> negative. calcru() would have to run the accumulation code and the >>> accumulation code would have to acquire something like sched_lock to >>> transfer the per-thread data (since the lock for updating that data >>> is something like sched_lock). This is has the same locking overheads >>> and larger non-locking overheads than accumulating the runtime directly >>> into the rusage at context switch time -- calcru() needs to acquire >>> something like sched_lock either way. >> >> Yes, it will make calcru() more expensive. However, this should be >> infrequent relative to context switches. It's only used for calls to >> getrusage(), fill_kinfo_proc(), and certain clock_gettime() calls. >> >> The thing that will protect mi_switch() is not process global. I want to >> keep process global locks out of mi_switch() or we reduce concurrency for >> multi-threaded applications. > > I still think it would be wise to try the simple approach first and only > engage in further complexity if it is warranted. I have indirectly shown that this approach will not yield sufficient results by decreasing the scope of the sched lock in other ways. This would gate context switches in the same way that a global scheduler lock would, except not over as long of a period. Moving stats to be per-thread really is not very complicated, and very likely optimizes the common case even in the absence of increased concurrency. We require fewer indirections for all stats increments in this way and also touch fewer cache lines in mi_switch(). Jeff > > -- > John Baldwin >