From owner-freebsd-stable@FreeBSD.ORG Tue Dec 18 06:20:38 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27E8916A41A; Tue, 18 Dec 2007 06:20:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id AAEE513C47E; Tue, 18 Dec 2007 06:20:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBI6KUvC013824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 17:20:31 +1100 Date: Tue, 18 Dec 2007 17:20:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071217103936.GR25053@tnn.dglawrence.com> Message-ID: <20071218170133.X32807@delplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 06:20:38 -0000 On Mon, 17 Dec 2007, David G Lawrence wrote: > One more comment on my last email... The patch that I included is not > meant as a real fix - it is just a bandaid. The real problem appears to > be that a very large number of vnodes (all of them?) are getting synced > (i.e. calling ffs_syncvnode()) every time. This should normally only > happen for dirty vnodes. I suspect that something is broken with this > check: > > if (vp->v_type == VNON || ((ip->i_flag & > (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && > vp->v_bufobj.bo_dirty.bv_cnt == 0)) { > VI_UNLOCK(vp); > continue; > } Isn't it just the O(N) algorithm with N quite large? Under ~5.2, on a 2.2GHz A64 UP in 32-bit mode, I see a latency of 3 ms for 17500 vnodes, which would be explained by the above (and the VI_LOCK() and loop overhead) taking 171 ns per vnode. I would expect it to take more like 20 ns per vnode for UP and 60 for SMP. The comment before this code shows that the problem is known, and says that a subroutine call cannot be afforded unless there is work to do, but the, the locking accesses look like subroutine calls, have subroutine calls in their internals, and take longer than simple subroutine calls in the SMP case even when they don't make subroutine calls. (IIRC, on A64 a minimal subroutine call takes 4 cycles while a minimal locked instructions takes 18 cycles; subroutine calls are only slow when their branches are mispredicted.) Bruce