From owner-freebsd-current@FreeBSD.ORG Tue Jan 6 07:21:26 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 37D3416A4CE for ; Tue, 6 Jan 2004 07:21:26 -0800 (PST) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5002243D49 for ; Tue, 6 Jan 2004 07:21:21 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 3716 invoked from network); 6 Jan 2004 15:21:20 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 6 Jan 2004 15:21:20 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.10/8.12.10) with ESMTP id i06FLFM0025516; Tue, 6 Jan 2004 10:21:16 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20040105175530.K4057@gamplex.bde.org> Date: Tue, 06 Jan 2004 10:21:23 -0500 (EST) From: John Baldwin To: Bruce Evans X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) cc: "Bjoern A. Zeeb" cc: current@freebsd.org Subject: Re: Expensive timeout(9) function ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jan 2004 15:21:26 -0000 On 05-Jan-2004 Bruce Evans wrote: > On Sun, 4 Jan 2004, Bjoern A. Zeeb wrote: > >> what reports do you expect with the >> >> "Expensive timeout(9) function" >> >> message ? > > What you reported (function names and timeout time) is interesting. > > Why do we see it ? > > Kernel bugs :-). > >> Expensive timeout(9) function: 0xc04885a0(0) 1.024846430 s [1] >> Expensive timeout(9) function: 0xc04885a0(0) 1.024846430 s [1] >> Expensive timeout(9) function: 0xc04b3940(0) 0.008629758 s [2] >> Expensive timeout(9) function: 0xc04b39a0(0) 0.004333781 s [2] >> Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] >> Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] >> Expensive timeout(9) function: 0xc04f71f0(0) 0.027004551 s [3] >> >> [1] sys/kern/kern_synch.c:loadav() >> [2] sys/kern/uipc_domain.c:pfslowtimo() >> [3] sys/netinet/ip_fw2.c:ipfw_tick() > > [1] is easiest to understand. loadav() is obviously broken since it uses > sleep locks. Apparently it sometimes sleeps for more than 1 second > altogether! There is a check for sleeping in timeouts under DIAGNOSTIC. > I would expect to complaints from this too if you just used DIAGNOSTIC > to get the above. > > [3] ipfw_tick() is obviously broken in the same way. This is from > blind conversion of splimp() to a sleep lock. Mutexes work quite > differently from spl's. A quick fix for timeout routines that only > lock things once might be to use mtx_trylock() and not do anything in > the timeout routine (except re-arm the timeout, perhaps with a smaller > interval) if the mutex cannot be acquired immediately. This depends > on the exact timing of timeout routines not being critical (not that > we have exact timing -- the above shows all timeouts being delayed by > a factor of at least 100 (1 second instead of 1/100 seconds)). This > should work expecially well in loadav() -- loadav() intentionally adds > jitter to the interval. This might have worked in schedcpu() too > (schedcpu() was converted to a thread). Ugh, loadav() needs to move to a thread, too, then. Perhaps loadav() and schedcpu() can share a thread by having the schedcpu thread just run loadav() occasionally. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/