From owner-freebsd-net@FreeBSD.ORG Tue Oct 31 15:25:01 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6DAB316A416 for ; Tue, 31 Oct 2006 15:25:01 +0000 (UTC) (envelope-from rrs@cisco.com) Received: from sj-iport-5.cisco.com (sj-iport-5.cisco.com [171.68.10.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1C4D43D9C for ; Tue, 31 Oct 2006 15:24:39 +0000 (GMT) (envelope-from rrs@cisco.com) Received: from sj-dkim-2.cisco.com ([171.71.179.186]) by sj-iport-5.cisco.com with ESMTP; 31 Oct 2006 07:24:33 -0800 Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-2.cisco.com (8.12.11.20060308/8.12.11) with ESMTP id k9VFOXM0008433; Tue, 31 Oct 2006 07:24:33 -0800 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id k9VFOXW4025007; Tue, 31 Oct 2006 07:24:33 -0800 (PST) Received: from xfe-sjc-212.amer.cisco.com ([171.70.151.187]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 31 Oct 2006 07:24:33 -0800 Received: from [127.0.0.1] ([171.68.225.134]) by xfe-sjc-212.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 31 Oct 2006 07:24:31 -0800 Message-ID: <45476A82.4090407@cisco.com> Date: Tue, 31 Oct 2006 10:23:46 -0500 From: Randall Stewart User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060223 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andy Jones References: <86992cb10610301621j32cc5d65lc0c95e62c3f0df1c@mail.gmail.com> In-Reply-To: <86992cb10610301621j32cc5d65lc0c95e62c3f0df1c@mail.gmail.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 31 Oct 2006 15:24:33.0054 (UTC) FILETIME=[AA97CBE0:01C6FD00] DKIM-Signature: a=rsa-sha1; q=dns; l=3904; t=1162308273; x=1163172273; c=relaxed/simple; s=sjdkim2002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rrs@cisco.com; z=From:Randall=20Stewart=20 |Subject:Re=3A=20Throughput=20problems=20with=20dummynet=20and=20high=20delays; X=v=3Dcisco.com=3B=20h=3DhA4GUy4aZPRPVh2CYQfCfDA6joM=3D; b=jpkEC5RKafrTGC5cACCA1Aa/KtAwh+KQCCV64Hur+ygclR0pHrvgO9M9K49vUymz7b4HzIh4 fgX+EEBdc37+ICP4Td2HIHbUThfQD1eMYDAlWJVKTmi/oVNQfPjetekN; Authentication-Results: sj-dkim-2.cisco.com; header.From=rrs@cisco.com; dkim=pass ( sig from cisco.com verified; ); Cc: freebsd-net@freebsd.org Subject: Re: Throughput problems with dummynet and high delays X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2006 15:25:01 -0000 Andy: Hi, you must be working with Injong :-) At one point I was playing with dummynet in satellite networks. One of the KEY problems I found is that the number of packets you can have in queue, limited to 100, was not nearly enough. This was (at the time) a hardcoded parameter inside both the ipfw code as well as the dummynet code. What I did is changed this to be a #define inside the ip_dummynet.h file.. What this then allowed was to set the value up a bit higher... I used 1200 for my 550ms+ sat network... With just rough math you need about 89,478 - 1500 bytes packets per second to get a gig a bit. So thats about 894 per 100ms.. but I would want more than that I am thinking... Of course you might be able to reduce that with 9000 byte mtus... but even at 9000 byte mtu 100 packets will NOT cut it... My old patches were from way back in the 4.10 erra.. I could dig around and see if I can find something.. but I would be able to only give you something for: 6.1 or 7.0... I don't have a 6.0 machine around anywhere .. and soon I will be moving my 6.1 -> 6.2 :-) Let me know if you want me to poke around.. or of course you could too.. its not a hard change to make :-) R Andy Jones wrote: > Hi, > > I'm a researcher at the University of North Carolina trying to simulate > certain link characteristics using dummynet pipes in ipfw. Our end goal is > to thoroughly test high speed TCP variants in our experimental network in a > wide range of situations (which includes varying the delay from 1ms to > 200ms). > > I have two Dell PowerEdge 2850 servers connected to each other using an > Intel Gigabit ethernet card (although I'm not sure of the exact model). > They > both run FreeBSD 6.0. I'm using iperf to push as many bits through the wire > as possible. > > Without dummynet, sustained throughput is as expected, close to 1Gbps > [ 3] 0.0-180.0 sec 19.2 GBytes 918 Mbits/sec > > When dummynet is used to add delay (100ms in my case) to the network, the > machines have problems sustaining high throughput. > > Here are the setup on the receiver end > % sysctl kern.ipc.maxsockbuf=16M > % sysctl net.inet.tcp.recvspace=12MB > % iperf -s > > and on the sender end > % sysctl kern.ipc.maxsockbuf=16M > % sysctl net.inet.tcp.sendspace=12MB > % ipfw pipe 1 config delay 100 > % ipfw add 10 pipe 1 ip from any to any out > % iperf -c [args ...] > > kern.ipc.nmbclusters has also been tuned to 65536 at boot time. Our kernel > is also has HZ=1000. The ipfw rule is added such that it is the first rule > in the chain. 12MB is about the right size send buffer for the > bandwidth-delay product (1Gbps * 0.1 RTT / 8bits/byte). We're also using an > MTU of roughly 9000 bytes. > > What happens is as the TCP window grows larger (about 3-4MB), the sender > spends most of its time processing interrupts (80-90% as reported by top) > and throughput peaks at about 300Mbps. I've dug into the dummynet code and > I've found that a large amount of time is spent in the routine > transmit_event(struct dn_pipe *p) which dequeues packets from a pipe and > calls ip_output. It appears that ip_output is the culprit, but what it is > doing with its time, I'm not sure. Packet drops are not being lost > according > to TCP and dummynet. I suspect either pfil_run_hooks(...) or (* > ifp->if_output) (...) calls in ip_output are taking too much time, but I'm > not sure. > > Any suggestions on what could be happening would be appreciated! > > -Andy Jones > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 803-317-4952 (cell)