From owner-freebsd-stable@FreeBSD.ORG Wed Dec 19 19:41:12 2007 Return-Path: Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 736E616A417; Wed, 19 Dec 2007 19:41:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 167C713C4DD; Wed, 19 Dec 2007 19:41:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJJf3qs021871 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 06:41:08 +1100 Date: Thu, 20 Dec 2007 06:41:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071219170434.GG25053@tnn.dglawrence.com> Message-ID: <20071220051751.E38491@delplex.bde.org> References: <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> <20071219170434.GG25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG, Bruce Evans Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:41:12 -0000 On Wed, 19 Dec 2007, David G Lawrence wrote: >> The patch should work fine. IIRC, it yields voluntarily so that other >> things can run. I committed a similar hack for uiomove(). It was > > It patches the bottom of the loop, which is only reached if the vnode > is dirty. So it will only help if there are thousands of dirty vnodes. > While that condition can certainly happen, it isn't the case that I'm > particularly interested in. Oops. When it reaches the bottom of the loop, it will probably block on i/o sometimes, so that the problem is smaller anyway. >> CPUs, everything except interrupts has to wait for these syscalls. Now >> the main problem is to figure out why PREEMPTION doesn't work. I'm >> not working on this directly since I'm running ~5.2 where nearly-full >> kernel preemption doesn't work due to Giant locking. > > I don't understand how PREEMPTION is supposed to work (I mean > to any significant detail), so I can't really comment on that. Me neither, but I will comment anyway :-). I think PREEMPTION should even preempt kernel threads in favor of (higher priority of course) user threads that are in the kernel, but doesn't do this now. Even interrupt threads should have dynamic priorities so that when they become too hoggish they can be preempted even by user threads subject to the this priority rule. This is further from happening. ffs_sync() can hold the mountpoint lock for a long time. That gives problems preempting it. To move your fix to the top of the loop, I think you just need to drop the mountpoint lock every few hundred iterations while yielding. This would help for PREEMPTION too. Dropping the lock must be safe because it is already done while flushing. Hmm, the loop is nicely obfuscated and pessimized in current (see rev.1.234). The fast (modulo no cache misses) path used to be just a TAILQ_NEXT() to reach the next vnode, but now unnecessarily joins the slow path at MNT_VNODE_FOREACH(), and MNT_VNODE_FOREACH() hides a function call. Bruce