From owner-freebsd-performance@FreeBSD.ORG Wed Jun 14 06:25:47 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FCB016A474; Wed, 14 Jun 2006 06:25:47 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3385F43D46; Wed, 14 Jun 2006 06:25:45 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout1.pacific.net.au (Postfix) with ESMTP id 2EEB9527FD4; Wed, 14 Jun 2006 16:22:55 +1000 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5E6MkkG030715; Wed, 14 Jun 2006 16:22:48 +1000 Date: Wed, 14 Jun 2006 16:22:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: kmacy@fsmware.com In-Reply-To: Message-ID: <20060614133024.E1753@epsplex.bde.org> References: <20060612195754.72452.qmail@web33306.mail.mud.yahoo.com> <20060612210723.K26068@fledge.watson.org> <20060612203248.GA72885@xor.obsecurity.org> <200606130715.52425.davidxu@freebsd.org> <20060613105930.N34121@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Scott Long , kmacy@freebsd.org, Paul Saab , Robert Watson , David Xu , Kris Kennaway , freebsd-performance@freebsd.org, danial_thom@yahoo.com Subject: Re: Initial 6.1 questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2006 06:25:47 -0000 On Tue, 13 Jun 2006, Kip Macy wrote: > ... > Why do I say "non-interrupt blocking?". Currently we have roughly a > half dozen locking primitives. The two that I am familiar with are > blocking and spinning mutexes. The general policy is to use blocking > locks except where a lock is used in interrupts or the scheduler. It > seems to me that in the scheduler interrupts only actually need to be > blocked across cpu_switch. Spin locks obviously have to be used > because a thread cannot very well context switch while its in the > middle of context switching - however, provided td_critnest > 0, there > is no reason that interrupts need to be blocked. Currently sched_lock > is acquired in cpu_hardclock and statclock - so it does need to block > interrupts. There is no reason that these two functions couldn't be > run in ast(). These functions are called from "fast" interrupt handlers, so they cannot use sleep locks. They also cannot be run in ast(), since ast() is only run on return to user mode and uses sleep locks a lot. Gathering of some user-mode statistics could be deferred until return to user mode, but this wouldn't work for kernel-mode statistics, which is never for threads that never leave the kernel, and large changes would be required for the user-mode statistics: algorithmic changes: various, mainly to keep kernel-mode separate; locking: ast() uses sched_lock, so without large changes you would just move the problem (there would be up to hz + stathz extra calls to ast() per second); the statistics fields are all locked by sched_lock, and although this would not be needed for access in ast() some locking would still be needed for many which are accessed from elsewhere). What they (and all fast interrupt handlers or even "fast" interrupt handlers) can do better is use spin locks != sched_lock (and for fast interrupt handlers, != mtx_lock_spin(any)). This is not easy to do in general, and is especially difficult for clock interrupt handlers, because all accesses to data accessed by a fast interrupt handler must be locked by a common lock (especially outside of the handlers) and clock interrupt handlers access a lot of data. Currently, clock interrupt handlers use sched_lock and depend on sched_lock being used too much so that most of the data accessed by clock interrupt handlers is locked automatically. Even then, there are large gaps in the locking. E.g., hardclock() starts by calling tc_ticktock() which mostly uses very delicate time-domain locking but sometimes races with syscalls that use sleep locking, most frequently by calling ntp_update_second(). Most of kern_ntptime.c is documented (in comments) as being required to run at splclock() or higher, but it is actually all locked only by Giant, so sched_lock'ing and other spinlocking for it is neither necessary or sufficient, and calling it correctly from a "fast" interrupt handler is impossible. In my kernel, fast interrupt handlers (and associated non-handler code that shares data) are actually fast (== low-latency && !(very-large-footprint || takes-very-long)). This requires: - mtx_lock_spin() to not mask interrupts, since masking interrupts gives !low-latency at least in the UP case. - fast interrupt handlers to not use sched_lock, since sched_lock gives very-large-footprint. - fast interrupt handlers to not use only mtx_lock_spin(), since that no longer masks them. My implementation actually uses simple_locks plus explicit per-cpu interrupt disabling (as in RELENG_4). This also avoids having to turn off features like WITNESS and KTR which don't honor the rules for fast interrupt handlers. - fast interrupt handlers to not use normal scheduling (things like swi_sched()), since that uses sched_lock and is generally very inefficient. My implementation uses a combination of timeouts and a hack to metamorphose into a SWI handler. The latter is a very expensive operation and should be avoided. swi_sched() encourages this inefficiency except in the SWI_DELAY case. The SWI_DELAY case only takes 50-100 times as many instructions as corresponding scheduling in RELENG_4. SWI_DELAY seems to be unused except in my drivers. My implementation enforces non-use of normal scheduling and some other invalid data accesses (e.g., to curthread) unmapping PCPU data in fast interrupt handlers. - clock interrupt handlers to not be fast interrupt handlers. They have far too large a footprint to be fast interrupt handlers. Locking them is hard enough when they are only "fast" interrupt handlers. I made them normal interrupt handlers and don't support "fast" interrupt handlers. I get very few benefits from this. Normal interrupt handlers for clocks are inefficient. They don't take very long, but switching to them is inefficient. I get lower interrupt latency, but this is not very important now that CPUs are very fast compared with i/o for all devices that I have. I get the possibility of simpler locking in clock interrupt handlers, but haven't simplified or fixed their locking. I get enforced smallness and complexity for fast interrupt handlers since large ones would be too complicated and normal scheduling and locking cannot be used. Bruce