From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 07:09:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 391DD16A46C; Sat, 22 Dec 2007 07:09:38 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1316413C458; Sat, 22 Dec 2007 07:09:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBM79b5I029206; Fri, 21 Dec 2007 23:09:37 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBM79bC6029205; Fri, 21 Dec 2007 23:09:37 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 23:09:37 -0800 From: David G Lawrence To: David Schwartz Message-ID: <20071222070937.GU25053@tnn.dglawrence.com> References: <20071221234347.GS25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 23:09:37 -0800 (PST) Cc: "Freebsd-Net@Freebsd. Org" , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 07:09:38 -0000 > I'm just an observer, and I may be confused, but it seems to me that this is > motion in the wrong direction (at least, it's not going to fix the actual > problem). As I understand the problem, once you reach a certain point, the > system slows down *every* 30.999 seconds. Now, it's possible for the code to > cause one slowdown as it cleans up, but why does it need to clean up so much > 31 seconds later? > > Why not find/fix the actual bug? Then work on getting the yield right if it > turns out there's an actual problem for it to fix. > > If the problem is that too much work is being done at a stretch and it turns > out this is because work is being done erroneously or needlessly, fixing > that should solve the whole problem. Doing the work that doesn't need to be > done more slowly is at best an ugly workaround. > > Or am I misunderstanding? It's the syncer that is causing the problem, and it runs every 31 seconds. Historically, the syncer ran every 30 seconds, but things have changed a bit over time. The reason that the syncer takes so muck time is that ffs_sync is a bit stupid in how it works - it loops through all of the vnodes on each ffs mountpoint (typically almost all of the vnodes in the system) to see if any of them need to be synced out. This was marginally okay when there were perhaps a thousand vnodes in the system, but when the maximum number of vnodes was dramatically increased in FreeBSD some years ago (to typically 50000-100000) and combined with kernel threads of FreeBSD 5, this has resulted in some rather bad side effects. I think the proper solution would be to create a ffs_sync work list (another TAILQ/LISTQ), probably with the head in the mountpoint struct, that has on it any vnodes that need to be synced. Unfortuantely, such a change would be extensive, scattered throughout much of the ufs/ffs code. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities.