From owner-freebsd-current@FreeBSD.ORG Sun Jan 4 23:22:42 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8837416A4CE for ; Sun, 4 Jan 2004 23:22:42 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B79A43D48 for ; Sun, 4 Jan 2004 23:22:40 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id SAA23968; Mon, 5 Jan 2004 18:22:32 +1100 Date: Mon, 5 Jan 2004 18:22:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: "Bjoern A. Zeeb" In-Reply-To: Message-ID: <20040105175530.K4057@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: Expensive timeout(9) function ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jan 2004 07:22:42 -0000 On Sun, 4 Jan 2004, Bjoern A. Zeeb wrote: > what reports do you expect with the > > "Expensive timeout(9) function" > > message ? What you reported (function names and timeout time) is interesting. Why do we see it ? Kernel bugs :-). > Expensive timeout(9) function: 0xc04885a0(0) 1.024846430 s [1] > Expensive timeout(9) function: 0xc04885a0(0) 1.024846430 s [1] > Expensive timeout(9) function: 0xc04b3940(0) 0.008629758 s [2] > Expensive timeout(9) function: 0xc04b39a0(0) 0.004333781 s [2] > Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] > Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] > Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] > > [1] sys/kern/kern_synch.c:loadav() > [2] sys/kern/uipc_domain.c:pfslowtimo() > [3] sys/netinet/ip_fw2.c:ipfw_tick() [1] is easiest to understand. loadav() is obviously broken since it uses sleep locks. Apparently it sometimes sleeps for more than 1 second altogether! There is a check for sleeping in timeouts under DIAGNOSTIC. I would expect to complaints from this too if you just used DIAGNOSTIC to get the above. [3] ipfw_tick() is obviously broken in the same way. This is from blind conversion of splimp() to a sleep lock. Mutexes work quite differently from spl's. A quick fix for timeout routines that only lock things once might be to use mtx_trylock() and not do anything in the timeout routine (except re-arm the timeout, perhaps with a smaller interval) if the mutex cannot be acquired immediately. This depends on the exact timing of timeout routines not being critical (not that we have exact timing -- the above shows all timeouts being delayed by a factor of at least 100 (1 second instead of 1/100 seconds)). This should work expecially well in loadav() -- loadav() intentionally adds jitter to the interval. This might have worked in schedcpu() too (schedcpu() was converted to a thread). Bruce