From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 16:20:40 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8861C16A46B; Tue, 18 Dec 2007 16:20:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by mx1.freebsd.org (Postfix) with ESMTP id 10CCA13C4D1; Tue, 18 Dec 2007 16:20:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBIGKSE0008461 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Dec 2007 03:20:31 +1100 Date: Wed, 19 Dec 2007 03:20:28 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071218141742.GS25053@tnn.dglawrence.com> Message-ID: <20071219022102.I34422@delplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, Scott Long , freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 16:20:40 -0000 On Tue, 18 Dec 2007, David G Lawrence wrote: > I didn't say it caused any bogus disk I/O. My original problem > (after a day or two of uptime) was an occasional large scheduling delay > for a process that needed to process VoIP frames in real-time. It was > happening every 31 seconds and was causing voice frames to be dropped > due to the large latency causing the frame to be outside of the jitter > window. I wrote a program that measures the scheduling delay by sleeping > for one tick and then comparing the timeofday offset from what was > expected. This revealed that every 31 seconds, the process was seeing > a 17ms delay in scheduling. Further investigation found that 1) the I got an almost identical delay (with 64000 vnodes). Now, 17ms isn't much. Delays much have been much longer when CPUs were many times slower and RAM/vnodes were not so many times smaller. High-priority threads just need to be able to preempt the syncer so that they don't lose data (unless really hard real time is supported, which it isn't). This should work starting with about FreeBSD-6 (probably need "options PREEMPT"). I doesn't work in ~5.2 due to Giant locking, but I find Giant locking to rarely matter for UP. Old versions of FreeBSD were only able to preempt to non-threads (interrupt handlers) yet they somehow survived the longer delays. They didn't have Giant locking to get in the way, and presumably avoided packet loss by doing lots in interrupt handlers (hardware isr and netisr). I just remembered that I have seen packet loss even under -current when I leave out or turn off "options PREEMPT". > ... > and it completely resolved the problem. Since the wait that I added > is at the bottom of the loop and the limit is 500 vnodes, this tells > me that every 31 seconds, there are a whole lot of vnodes that are > being "synced", when there shouldn't have been any (this fact wasn't > apparent to me at the time, but when I later realized this, I had > no time to investigate further). My tests and analysis have all been > on an otherwise quiet system (no disk I/O), so the bottom of the > ffs_sync vnode loop should not have been reached at all, let alone > tens of thousands of times every 31 seconds. All machines were uni- > processor, FreeBSD 6+. I don't know if this problem is present in 5.2. > I didn't see ffs_syncvnode in your call graph, so it probably is not. I chopped to a float profile with only top callers. Any significant calls from ffs_sync() would show up as top callers. I still have the data, and the call graph shows much more clearly that there was just one dirty vnode for the whole sync(): % 0.00 0.01 1/1 syscall [3] % [4] 88.7 0.00 0.01 1 sync [4] % 0.01 0.00 5/5 ffs_sync [5] % 0.01 0.00 6/6 vfs_msync [6] % 0.00 0.00 7/8 vfs_busy [260] % 0.00 0.00 7/8 vfs_unbusy [263] % 0.00 0.00 6/7 vn_finished_write [310] % 0.00 0.00 6/6 vn_start_write [413] % 0.00 0.00 1/1 vfs_stdnosync [472] % % ----------------------------------------------- % % 0.01 0.00 5/5 sync [4] % [5] 50.7 0.01 0.00 5 ffs_sync [5] % 0.00 0.00 1/1 ffs_fsync [278] % 0.00 0.00 1/60 vget [223] % 0.00 0.00 1/60 ufs_vnoperatespec [78] % 0.00 0.00 1/26 vrele [76] It passed the flags test just once to get to the vget(). ffs_syncvnode() doesn't exist in 5.2, and ffs_fsync() is called instead. % % ----------------------------------------------- % % 0.01 0.00 6/6 sync [4] % [6] 38.0 0.01 0.00 6 vfs_msync [6] % % ----------------------------------------------- % ... % % 0.00 0.00 1/1 ffs_sync [5] % [278] 0.0 0.00 0.00 1 ffs_fsync [278] % 0.00 0.00 1/1 ffs_update [368] % 0.00 0.00 1/4 vn_isdisk [304] This is presumbly to sync the 1 dirty vnode. BTW I use noatime a lot, including for all file systems used in the test, so the tree walk didn't dirty any vnodes. A tar to /dev/zero would dirty all vnodes if everything were mounted without this option. % ... % % cumulative self self total % time seconds seconds calls ns/call ns/call name % 50.7 0.008 0.008 5 1666427 1667246 ffs_sync [5] % 38.0 0.015 0.006 6 1041217 1041217 vfs_msync [6] % 3.1 0.015 0.001 0 100.00% mcount [7] % 1.5 0.015 0.000 0 100.00% mexitcount [8] % 0.6 0.015 0.000 0 100.00% cputime [22] % 0.6 0.016 0.000 34 2660 2660 generic_bcopy [24] % 0.5 0.016 0.000 0 100.00% user [26] Bruce