From owner-freebsd-stable@FreeBSD.ORG Mon Dec 17 17:57:21 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF1CC16A41B for ; Mon, 17 Dec 2007 17:57:21 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 969BE13C45D for ; Mon, 17 Dec 2007 17:57:21 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 54024 invoked from network); 17 Dec 2007 17:57:20 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 17 Dec 2007 17:57:20 -0000 In-Reply-To: <20071217054305.GA18268@eos.sc1.parodius.com> References: <20071217054305.GA18268@eos.sc1.parodius.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Mon, 17 Dec 2007 12:57:05 -0500 To: Jeremy Chadwick X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 17:57:21 -0000 Back to back test with no ethernet switch between two em interfaces, same result. The receiving side has been up > 1 day and exhibits the problem. These are also two different servers. The small gettimeofday() syscall tester also shows the same ~30 second pattern of high latency between syscalls. Receiver test application reports 3699 missed packets Sender netstat -i: (before test) em1 1500 00:04:23:cf:51:b7 20 0 15975785 0 0 em1 1500 10.1/24 10.1.0.2 37 - 15975801 - - (after test) em1 1500 00:04:23:cf:51:b7 22 0 25975822 0 0 em1 1500 10.1/24 10.1.0.2 39 - 25975838 - - total IP packets sent in during test = end - start 25975838-15975801 = 10000037 (expected, 1,000,000 packets test + overhead) Receiver netstat -i: (before test) em1 1500 00:04:23:c4:cc:89 15975785 0 21 0 0 em1 1500 10.1/24 10.1.0.1 15969626 - 19 - - (after test) em1 1500 00:04:23:c4:cc:89 25975822 0 23 0 0 em1 1500 10.1/24 10.1.0.1 25965964 - 21 - - total ethernet frames received during test = end - start 25975822-15975785 = 10000037 (as expected) total IP packets processed during test = end - start 25965964-15969626 = 9996338 (expecting 10000037) Missed packets = expected - received 10000037-9996338 = 3699 netstat -i accounts for the 3699 missed packets also reported by the application Looking closer at the tester output again shows the periodic ~30 second windows of packet loss. There's a second problem here in that packets are just disappearing before they make it to ip_input(), or there's a dropped packets counter I've not found yet. I can provide remote access to anyone who wants to take a look, this is very easy to duplicate. The ~ 1 day uptime before the behavior surfaces is not making this easy to isolate. -- mark On Dec 17, 2007, at 12:43 AM, Jeremy Chadwick wrote: > On Mon, Dec 17, 2007 at 12:21:43AM -0500, Mark Fullmer wrote: >> While trying to diagnose a packet loss problem in a RELENG_6 >> snapshot dated >> November 8, 2007 it looks like I've stumbled across a broken >> driver or >> kernel routine which stops interrupt processing long enough to >> severly >> degrade network performance every 30.99 seconds. >> >> Packets appear to make it as far as ether_input() then get lost. > > Are you sure this isn't being caused by something the switch is doing, > such as MAC/ARP cache clearing or LACP? I'm just speculating, but it > would be worthwhile to remove the switch from the picture (crossover > cable to the rescue). > > I know that at least in the case of fxp(4) and em(4), Jack Vogel does > some through testing of throughput using a professional/high-end > packet > generator (some piece of hardware, I forget the name...) > > -- > | Jeremy Chadwick jdc at > parodius.com | > | Parodius Networking http:// > www.parodius.com/ | > | UNIX Systems Administrator Mountain View, > CA, USA | > | Making life hard for others since 1977. PGP: > 4BD6C0CB | > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org" >