From owner-freebsd-net@FreeBSD.ORG Tue Oct 31 00:21:09 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C7CA16A415 for ; Tue, 31 Oct 2006 00:21:09 +0000 (UTC) (envelope-from andjones@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D74143D7B for ; Tue, 31 Oct 2006 00:21:03 +0000 (GMT) (envelope-from andjones@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so1420651wxd for ; Mon, 30 Oct 2006 16:21:02 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=FqHJqFVwwzXXslFet8rs0llQ3RPjXwPaRZ2L1bepeTYgy/XlojlYVCGPGFQTXfkGAjBLvGF+7h0+LVBYF+s2Y5n7UAKI4tQ/h2J1ipalEkL1KQZ7pRJE/VxK0hzBR9MkVWJh0tzTYiZC2ARlC4Oh+X5CrJskCHxS8GDCV5+Yg6k= Received: by 10.70.11.1 with SMTP id 1mr5956793wxk; Mon, 30 Oct 2006 16:21:02 -0800 (PST) Received: by 10.70.49.20 with HTTP; Mon, 30 Oct 2006 16:21:02 -0800 (PST) Message-ID: <86992cb10610301621j32cc5d65lc0c95e62c3f0df1c@mail.gmail.com> Date: Mon, 30 Oct 2006 19:21:02 -0500 From: "Andy Jones" To: freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Throughput problems with dummynet and high delays X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2006 00:21:09 -0000 Hi, I'm a researcher at the University of North Carolina trying to simulate certain link characteristics using dummynet pipes in ipfw. Our end goal is to thoroughly test high speed TCP variants in our experimental network in a wide range of situations (which includes varying the delay from 1ms to 200ms). I have two Dell PowerEdge 2850 servers connected to each other using an Intel Gigabit ethernet card (although I'm not sure of the exact model). They both run FreeBSD 6.0. I'm using iperf to push as many bits through the wire as possible. Without dummynet, sustained throughput is as expected, close to 1Gbps [ 3] 0.0-180.0 sec 19.2 GBytes 918 Mbits/sec When dummynet is used to add delay (100ms in my case) to the network, the machines have problems sustaining high throughput. Here are the setup on the receiver end % sysctl kern.ipc.maxsockbuf=16M % sysctl net.inet.tcp.recvspace=12MB % iperf -s and on the sender end % sysctl kern.ipc.maxsockbuf=16M % sysctl net.inet.tcp.sendspace=12MB % ipfw pipe 1 config delay 100 % ipfw add 10 pipe 1 ip from any to any out % iperf -c [args ...] kern.ipc.nmbclusters has also been tuned to 65536 at boot time. Our kernel is also has HZ=1000. The ipfw rule is added such that it is the first rule in the chain. 12MB is about the right size send buffer for the bandwidth-delay product (1Gbps * 0.1 RTT / 8bits/byte). We're also using an MTU of roughly 9000 bytes. What happens is as the TCP window grows larger (about 3-4MB), the sender spends most of its time processing interrupts (80-90% as reported by top) and throughput peaks at about 300Mbps. I've dug into the dummynet code and I've found that a large amount of time is spent in the routine transmit_event(struct dn_pipe *p) which dequeues packets from a pipe and calls ip_output. It appears that ip_output is the culprit, but what it is doing with its time, I'm not sure. Packet drops are not being lost according to TCP and dummynet. I suspect either pfil_run_hooks(...) or (* ifp->if_output) (...) calls in ip_output are taking too much time, but I'm not sure. Any suggestions on what could be happening would be appreciated! -Andy Jones