Date: Tue, 18 Dec 2007 16:49:14 -0500 From: Mark Fullmer <maf@eng.oar.net> To: David G Lawrence <dg@dglawrence.com> Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, Bruce Evans <brde@optusnet.com.au> Subject: Re: Packet loss every 30.999 seconds Message-ID: <CD187AD1-8712-418F-9F49-FA3407BA1AC7@eng.oar.net> In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com> References: <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net> <20071217102433.GQ25053@tnn.dglawrence.com>
next in thread | previous in thread | raw e-mail | index | archive | help
A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss test overnight. At ~ 32480 seconds of uptime the problem starts. This is really close to a 16 bit overflow... See http://www.eng.oar.net/~maf/bsd6/ p1.png and http://www.eng.oar.net/~maf/bsd6/p2.png. The missing impulses at 31 second marks are the intervals between test runs. The window of missing packets (timestamps between two packets where a sequence number is missing) is usually less than 4us, altough I'm not sure gettimeofday() can be trusted for measuring this. See https://www.eng.oar.net/~maf/bsd6/ p3.png Things I'll try tonight: o check on the patched kernel o Try KTR debugging enabled before and after an expected high latency period. o Dump all files to /dev/null to trigger the behavior. I would expect the vnode problem to look a little different on the packet loss graphs over time. If this leads anywher I'll add a counter before the msleep() and see how often it's getting there. On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote: > I noticed this as well some time ago. The problem has to do with > the > processing (syncing) of vnodes. When the total number of allocated > vnodes > in the system grows to tens of thousands, the ~31 second periodic sync > process takes a long time to run. Try this patch and let people > know if > it helps your problem. It will periodically wait for one tick (1ms) > every > 500 vnodes of processing, which will allow other things to run. > > Index: ufs/ffs/ffs_vfsops.c > =================================================================== > RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v > retrieving revision 1.290.2.16 > diff -c -r1.290.2.16 ffs_vfsops.c > *** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16 > --- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000 > *************** > *** 1109,1114 **** > --- 1109,1115 ---- > int softdep_deps; > int softdep_accdeps; > struct bufobj *bo; > + int flushed_count = 0; > > fs = ump->um_fs; > if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */ > *************** > *** 1174,1179 **** > --- 1175,1184 ---- > allerror = error; > vput(vp); > MNT_ILOCK(mp); > + if (flushed_count++ > 500) { > + flushed_count = 0; > + msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1); > + } > } > MNT_IUNLOCK(mp); > /* > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CD187AD1-8712-418F-9F49-FA3407BA1AC7>