From owner-freebsd-stable@FreeBSD.ORG Sat Dec 22 18:02:29 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E8C416A421 for ; Sat, 22 Dec 2007 18:02:29 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 36E0513C4DB for ; Sat, 22 Dec 2007 18:02:29 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 7917 invoked from network); 22 Dec 2007 18:02:28 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 22 Dec 2007 18:02:28 -0000 In-Reply-To: <20071223032944.G48303@delplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Sat, 22 Dec 2007 13:02:12 -0500 To: Bruce Evans X-Mailer: Apple Mail (2.752.3) Cc: Kostik Belousov , freebsd-net@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 18:02:29 -0000 On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: > > I still don't understand the original problem, that the kernel is not > even preemptible enough for network interrupts to work (except in 5.2 > where Giant breaks things). Perhaps I misread the problem, and it is > actually that networking works but userland is unable to run in time > to avoid packet loss. > The test is done with UDP packets between two servers. The em driver is incrementing the received packet count correctly but the packet is not making it up the network stack. If the application was not servicing the socket fast enough I would expect to see the "dropped due to full socket buffers" (udps_fullsock) counter incrementing, as shown by netstat -s. I grab a copy of netstat -s, netstat -i, and netstat -m before and after testing. Other than the link packets counter, I haven't seen any other indication of where the packet is getting lost. The em driver has a debugging stats option which does not indicate receive side overflows. I'm fairly certain this same behavior can be seen with the fxp driver, but I'll need to double check. These are results I sent a few days ago after setting up a test without an ethernet switch between the sender and receiver. The switch was originally used to verify the sender was actually transmitting. With spanning tree, ethernet keepalives, and CDP (cisco proprietary neighbor protocol) disabled and static ARP entries on the sender and receiver I can account for all packets making it to the receiver. ## > Back to back test with no ethernet switch between two em interfaces, > same result. The receiving side has been up > 1 day and exhibits > the problem. These are also two different servers. The small > gettimeofday() syscall tester also shows the same ~30 > second pattern of high latency between syscalls. > > Receiver test application reports 3699 missed packets > > Sender netstat -i: > > (before test) > em1 1500 00:04:23:cf:51:b7 20 0 > 15975785 0 0 > em1 1500 10.1/24 10.1.0.2 37 - > 15975801 - - > > (after test) > em1 1500 00:04:23:cf:51:b7 22 0 > 25975822 0 0 > em1 1500 10.1/24 10.1.0.2 39 - > 25975838 - - > > total IP packets sent in during test = end - start > 25975838-15975801 = 10000037 (expected, 1,000,000 packets test + > overhead) > > Receiver netstat -i: > > (before test) > em1 1500 00:04:23:c4:cc:89 15975785 0 > 21 0 0 > em1 1500 10.1/24 10.1.0.1 15969626 - > 19 - - > > (after test) > em1 1500 00:04:23:c4:cc:89 25975822 0 > 23 0 0 > em1 1500 10.1/24 10.1.0.1 25965964 - > 21 - - > > total ethernet frames received during test = end - start > 25975822-15975785 = 10000037 (as expected) > > total IP packets processed during test = end - start > 25965964-15969626 = 9996338 (expecting 10000037) > > Missed packets = expected - received > 10000037-9996338 = 3699 > > netstat -i accounts for the 3699 missed packets also reported by the > application > > Looking closer at the tester output again shows the periodic > ~30 second windows of packet loss. > > There's a second problem here in that packets are just disappearing > before they make it to ip_input(), or there's a dropped packets > counter I've not found yet. > > I can provide remote access to anyone who wants to take a look, this > is very easy to duplicate. The ~ 1 day uptime before the behavior > surfaces is not making this easy to isolate.