From owner-freebsd-arch@FreeBSD.ORG Tue May 29 20:11:43 2007 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 42F7016A4CF for ; Tue, 29 May 2007 20:11:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail20.syd.optusnet.com.au (mail20.syd.optusnet.com.au [211.29.132.201]) by mx1.freebsd.org (Postfix) with ESMTP id 02D6813C4C3 for ; Tue, 29 May 2007 20:11:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-225-63.carlnfd3.nsw.optusnet.com.au (c211-30-225-63.carlnfd3.nsw.optusnet.com.au [211.30.225.63]) by mail20.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l4TKBKlM008040 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 30 May 2007 06:11:24 +1000 Date: Wed, 30 May 2007 06:11:21 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jeff Roberson In-Reply-To: <20070529105856.L661@10.0.0.1> Message-ID: <20070530044158.H93160@delplex.bde.org> References: <20070529105856.L661@10.0.0.1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: rusage breakdown and cpu limits. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 May 2007 20:11:43 -0000 On Tue, 29 May 2007, Jeff Roberson wrote: > I'm working with Attilio to break down rusage further to be per-thread in > places where it is protected by the global scheduler lock. To support this, > I am interested in moving the rlimit cpulimit check into userret(), or > perhaps ast(). Is there any reason why we need to check this on every > context switch? Any objections to moving it? Eventually it will require a > different lock from the one we obtain to call mi_switch(). Checking it on every userret() would be a good pessimization, and scheduling ast()'s to check it occasionally wouldn't be far behind. Syscalls occur many times more often than context switches. Locking is needed for the updating the runtime in mi_switch, so it seems best to check for the runtime becoming too large there while holding the same lock. This lock is currently sched_lock and it covers the following other fields: Fields in in the proc struct (or attached to it, not counting the thread struct): p_stats->p_ru.ru_nvcsw p_stats->p_ru.ru_nivcsw p_stats->p_ru.rux_runtime (this) p_stats->p_ru.rux_uticks p_stats->p_ru.rux_iticks p_stats->p_ru.rux_sticks p_cpulimit (less volatile than the others) p_sflag (needs sched_lock more than the others?) p_flag (KSE only, invalid? -- needs PROC lock) Related fields in the pcpu: switchtime switchticks Related fields in the thread struct: td_uticks td_sticks td_iticks td_flags Unrelated fields in the thread struct: td_critnest (in a KASSERT() only) td_owepreempt (in a KASSERT() only) td_generation td_sched (not yet proerly locked) td_priority (this and following only in CTR*()) td_inhibitors td_lockname td_wmesg I don't see why you emphasized the cpulimit check, since it is the least delicate part of the rusage maintainance done by mi_check(). I don't see how the locking requirements can be reduced significantly by keeping the rusage in the thread struct more of the time (than the tick counts already are) and only accumulating it into the proc struct when needed. There must be a context switch of `switchtime' here, and this seems to require sched locking. It doesn't help to accumulate the runtime into the thread struct instead of into the rusage -- sched locking still seems to be required to switch the runtime. The core of this switch is: new_switchtime = cpu_ticks(); delta_runtime = new_switchtime - PCPU_GET(switchtime); PCPU_SET(switchtime, new_switchtime); and this must be done without letting any other thread run on the CPU. This seems to require sched_lock. After determining delta_runtime, we can add it into either thread data for accumulation later or to more global data (currently rusage). This part doesn't need such strong locking. We currently add it to rusage since we already hold the lock for this, and then checking the limit here also has low cost. Note that in the above code, keeping `switchtime' in the pcpu is an optimization (for space and perhaps for clarity). It could be kept in the thread that owns the CPU. Then its locking requirements might be clearer -- it must be switched at the same time as td_oncpu. The latter clearly requires sched_lock. The code for the latter has moved from mi_switch() sched_switch(). Maybe you emphasized the cpulimit check because this point wasn't clear -- things that strictly required sched_lock were supposed to have all been moved to sched_switch()? p_sflag and td_flag are only needed if we detect that the cpulimit has been exceeded (to schedule an AST). Moving the limit check to an AST would still require sched locking for setting td_flag. The cpulimit has a low resolution, so it doesn't need to be checked very often. The 4BSD scheduler could easily check it every second or so in the main loop of the scheduler. Accesses in the CTR*()s and KASSERT()s complicate the analysis and removing locks -- you don't want to have to acquire a lock just to debug, and the debugging code also gets in the way of understanding the locking, not to mention the function. Bruce