From owner-freebsd-net@FreeBSD.ORG Fri Jul 15 17:28:26 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE9FE1065674 for ; Fri, 15 Jul 2011 17:28:26 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8CF5C8FC18 for ; Fri, 15 Jul 2011 17:28:26 +0000 (UTC) Received: by gwb15 with SMTP id 15so764775gwb.13 for ; Fri, 15 Jul 2011 10:28:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Rg4gfujDVlmaz3ibzyBKTML0zy/APid2vAjjX2hqxzM=; b=V/H6sYp/opxurdW+VwIFlcfz0rm3HKRymX4jNyFXt8JqCGKZi7TJHsebmVxSwPkLU5 8Z6r41kB8Qd5skCEMxxbWB5HURMMlnTxk34safdDkcb/FC9AlZvPvduZlHzImYZXuaBY SbUJx/2g214UYu4tbnIasEhXtjaqcmfF+/6i8= MIME-Version: 1.0 Received: by 10.151.51.7 with SMTP id d7mr2654821ybk.426.1310750905733; Fri, 15 Jul 2011 10:28:25 -0700 (PDT) Received: by 10.151.27.21 with HTTP; Fri, 15 Jul 2011 10:28:24 -0700 (PDT) Received: by 10.151.27.21 with HTTP; Fri, 15 Jul 2011 10:28:24 -0700 (PDT) In-Reply-To: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk> References: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk> Date: Fri, 15 Jul 2011 10:28:24 -0700 Message-ID: From: Kevin Oberman To: Steven Hartland Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: igb enable_aim or flow_control causing tcp stalls? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2011 17:28:26 -0000 On Jul 15, 2011 9:59 AM, "Steven Hartland" wrote: > > Been trying to identify an strange network stalling issue while using > scp or rsync between two machines, initially at remote locations. > > The behaviour has proved quite difficult to track as it seems to require a > number or factors combined before the stalls occur. These seem to be: > 1. This particular target machine > 2. Some load, but not much on the machine, when idle we don't see stalls. > 3. Remote 9ms+ latency or high through put 50MB/s transmission speeds > > My current test case is copying a freebsd iso from a local machine to > the potentially problematic machine's /dev/null e.g. > scp FreeBSD-8.2-RELEASE-amd64-disc1.iso test1:/dev/null > > These machines are connected via a cisco 6509 -> supermicro blade > chassis. > > When the failure happens we see the following:- > scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null > FreeBSD-8.2-RELEASE-amd64-disc1.iso 21% 147MB 2.1MB/s - stalled - > > When all is well we see:- > scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null > FreeBSD-8.2-RELEASE-amd64-disc1.iso 100% 691MB 53.1MB/s 00:13 > > This setup:- > 1. Source machine 7.0-RELEASE-p2 using em0 > em0@pci0:6:0:0: class=0x020000 card=0x109615d9 chip=0x10968086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = 'PRO/1000 EB Network Connection' > class = network > subclass = ethernet > 2. Target (problem) machine 8.2-RELEASE using igb0 > igb0@pci0:5:0:0: class=0x020000 card=0x10e715d9 chip=0x10e78086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > class = network > subclass = ethernet > > I've tried switching to igb1 with no change, which also changes > switches and hence ports on the Cisco, so I don't at this point > believe there is an issue there. > > Now I've just noticed that igb has at least two sysctl's which > seemed interesting, enable_aim & flow_control (which is missing > from the man page btw). On disabling both, the stalls seem to go away. > > Unfortunately re-enabling them didn't re-introduce the stalls, but > this could another quirk when they don't re-enable properly? > > So the questions are:- > 1. Could either of these settings cause tcp stalls? > 2. If the nic and switch differ in flow control, what is the likely > effect? > 3. Any other thoughts? Use "tcpdump -s0 -w file.pcap host remote-system" to see how it fails. You may want to capture on both ends. Then use wireshark (in ports) to analyze the data. There are other tools to provide other types of analysis, depending on the type of problem. R. Kevin Oberman, Network Engineer Retired kob6558@gmail.com