From owner-freebsd-performance@FreeBSD.ORG  Wed Aug 11 00:10:25 2004
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1524E16A4CE
	for <freebsd-performance@freebsd.org>;
	Wed, 11 Aug 2004 00:10:25 +0000 (GMT)
Received: from gnosis.realityengine.ca (gnosis.realityengine.ca
	[69.55.224.140])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0075843D3F
	for <freebsd-performance@freebsd.org>;
	Wed, 11 Aug 2004 00:10:24 +0000 (GMT)
	(envelope-from erik.rothwell@realityengine.ca)
Received: from [192.168.5.3] (unknown [216.58.89.253])
	by gnosis.realityengine.ca (Postfix) with ESMTP id 17D7C7793
	for <freebsd-performance@freebsd.org>;
	Wed, 11 Aug 2004 00:10:24 +0000 (UTC)
User-Agent: Microsoft-Entourage/11.0.0.040405
Date: Tue, 10 Aug 2004 20:10:22 -0400
From: Erik Rothwell <erik.rothwell@realityengine.ca>
To: <freebsd-performance@freebsd.org>
Message-ID: <BD3EDC2E.42A6%erik.rothwell@realityengine.ca>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit
Subject: Network performance issues when writing to disk (5.2.1-RELEASE)
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Aug 2004 00:10:25 -0000

I have a workgroup server running FreeBSD 5.2.1 that is experiencing some
serious network performance problems over very minimal load.

The server has two miibus-based NICs: a WAN link via dc1 and a switched LAN
serving ~7 clients attached via dc0.

File transfers to the server (eg, scp, WebDAV, NFS, FTP) experience terrible
throughput, starting off at about 2MB/s and dropping rapidly to ~200KB/s.

At first, I suspected duplex mismatch or a misbehaving switch (unmanaged
switch with all client interfaces set to autosense), but the problem can be
replicated with the server connected directly to any of the clients with the
media at both ends explicitly set. netstat -ni indicates no collisions but a
few I/Oerrs during transfers.

It appears from testing that the interface itself responds intermittently
under this load (during transfers, the switch reports the port is
intermittently inactive -- ifconfig on the server will also sometimes report
"status: inactive") and packet loss approaches 75%. Occasionally, "dc0: TX
underrun -- increasing TX threshold" would crop up in dmesg, but that's not
usually noteworthy.

I replaced the NIC, yet the problem curiously persists.

After further testing, it seems the problem is isolated only to network file
transfers that actually write to disk.

For instance, using netcat to pipe random junk to /dev/null at the server,
throughput is good and the NIC behaves itself:

# nc -l -p 2332 -vv | dd if=/dev/stdin of=/dev/null bs=1k
listening on [any] 2332 ...
connect to [192.168.5.1] from [192.168.5.3] 51295
 sent 0, rcvd 104857600
94731+21879 records in
94731+21879 records out
104857600 bytes transferred in 11.897846 secs (8813158 bytes/sec)

When writing to a file, however:

# nc -l -p 2332 -vv | dd if=/dev/stdin of=/data/junk bs=1k
listening on [any] 2332 ...
connect to [192.168.5.1] from [192.168.5.3] 51296
 sent 0, rcvd 1130496
1046+115 records in
1046+115 records out
1130496 bytes transferred in 32.888345 secs (34374 bytes/sec)

Conversely, using netcat to pipe a 1GB file on the server to one of the
clients was no problem:

# nc -l -p 2332 -vv > /tmp/junk
...
698018816 bytes transferred in 95.423028 secs (7314993 bytes/sec)

Locally, disk performance isn't suffering unusually:

# dd if=/data/junk of=/dev/null
2246100+0 records in
2246100+0 records out
1150003200 bytes transferred in 97.113807 secs (11841809 bytes/sec)

# dd if=/dev/zero of=/data/junk bs=1k count=1m
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 50.131779 secs (21418387 bytes/sec)

After searching the lists and documentation, I'm still not sure what's
causing the problem. Has anyone seen anything of this nature before? Any
suggestions?

Erik.