From owner-freebsd-stable@FreeBSD.ORG Thu Dec 20 20:45:53 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3CBD16A41B for ; Thu, 20 Dec 2007 20:45:53 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 59CF813C43E for ; Thu, 20 Dec 2007 20:45:53 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 79588 invoked from network); 20 Dec 2007 20:45:52 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 20 Dec 2007 20:45:52 -0000 In-Reply-To: <20071219181158.GC57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071219181158.GC57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <1C1F9DB7-1B79-4718-9A27-379D1E6F0F10@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Thu, 20 Dec 2007 15:45:35 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 20:45:53 -0000 Thanks, I'll test this later on today. On Dec 19, 2007, at 1:11 PM, Kostik Belousov wrote: > On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: >>>> Try it with "find / -type f >/dev/null" to duplicate the problem >>>> almost >>>> instantly. >>> >>> I was able to verify last night that (cd /; tar -cpf -) > all.tar >>> would >>> trigger the problem. I'm working getting a test running with >>> David's ffs_sync() workaround now, adding a few counters there >>> should >>> get this narrowed down a little more. >> >> Unfortunately, the version of the patch that I sent out isn't >> going to >> help your problem. It needs to yield at the top of the loop, but >> vp isn't >> necessarily valid after the wakeup from the msleep. That's a >> problem that >> I'm having trouble figuring out a solution to - the solutions that >> come >> to mind will all significantly increase the overhead of the loop. >> As a very inadequate work-around, you might consider lowering >> kern.maxvnodes to something like 20000 - that might be low enough to >> not trigger the problem, but also be high enough to not significantly >> affect system I/O performance. > > I think the following may be safe. It counts only the clean scanned > vnodes > and does not evaluate the vp, that indeed may be reclaimed, after > the sleep. > > I never booted with the change. > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > index cbccc62..e686b97 100644 > --- a/sys/ufs/ffs/ffs_vfsops.c > +++ b/sys/ufs/ffs/ffs_vfsops.c > @@ -1176,6 +1176,7 @@ ffs_sync(mp, waitfor, td) > struct ufsmount *ump = VFSTOUFS(mp); > struct fs *fs; > int error, count, wait, lockreq, allerror = 0; > + int yield_count; > int suspend; > int suspended; > int secondary_writes; > @@ -1216,6 +1217,7 @@ loop: > softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); > MNT_ILOCK(mp); > > + yield_count = 0; > MNT_VNODE_FOREACH(vp, mp, mvp) { > /* > * Depend on the mntvnode_slock to keep things stable enough > @@ -1233,6 +1235,11 @@ loop: > (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && > vp->v_bufobj.bo_dirty.bv_cnt == 0)) { > VI_UNLOCK(vp); > + if (yield_count++ == 500) { > + yield_count = 0; > + msleep(&yield_count, MNT_MTX(mp), PZERO, > + "ffspause", 1); > + } > continue; > } > MNT_IUNLOCK(mp);