From owner-freebsd-performance@FreeBSD.ORG Thu Aug 12 15:19:16 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 60A9316A4CE for ; Thu, 12 Aug 2004 15:19:16 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id E88A543D49 for ; Thu, 12 Aug 2004 15:19:15 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i7CFHal7056552; Thu, 12 Aug 2004 11:17:36 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i7CFHaB4056549; Thu, 12 Aug 2004 11:17:36 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 12 Aug 2004 11:17:36 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Erik Rothwell In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: Network performance issues when writing to disk (5.2.1-RELEASE) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2004 15:19:16 -0000 On Tue, 10 Aug 2004, Erik Rothwell wrote: > I have a workgroup server running FreeBSD 5.2.1 that is experiencing > some serious network performance problems over very minimal load. A couple of background questions: - What sort of kernel configuration are you using -- GENERIC, or a custom kernel, and if so, is it an SMP box and is SMP enabled in the kernel? - When the system performance suddenly degrades, what is the apparent load and condition of the system? In particular, at that point if you measure the load average and CPU utilization, perhaps with "systat -vmstat" or "top", are the CPU(s) maxed out? How much time is in user vs. system vs. interrupt (vs. idle). - Can you confirm that the system does or doesn't have enough memory, and is or isn't paging much when the degredation occurs? "systat -vmstat" will probably be useeful here also. > The server has two miibus-based NICs: a WAN link via dc1 and a switched > LAN serving ~7 clients attached via dc0. > > File transfers to the server (eg, scp, WebDAV, NFS, FTP) experience > terrible throughput, starting off at about 2MB/s and dropping rapidly to > ~200KB/s. It would be interesting to know what condition the processes in question are in. In particular, how much time they're spending blocked in network, how much in CPU, and how much I/O bound to/from disk. It might be useful to ktrace the process, although ktrace will change the system load by consuming some combination of memory, disk, and cpu, of course. > At first, I suspected duplex mismatch or a misbehaving switch (unmanaged > switch with all client interfaces set to autosense), but the problem can > be replicated with the server connected directly to any of the clients > with the media at both ends explicitly set. netstat -ni indicates no > collisions but a few I/Oerrs during transfers. > > It appears from testing that the interface itself responds > intermittently under this load (during transfers, the switch reports the > port is intermittently inactive -- ifconfig on the server will also > sometimes report "status: inactive") and packet loss approaches 75%. > Occasionally, "dc0: TX underrun -- increasing TX threshold" would crop > up in dmesg, but that's not usually noteworthy. This is the if_dc interface realizing that the packet flow is going faster and accomodating. However, the intermittent response thing sounds like a more serious problem. It might be useful to run a long-running ping session using "ping -c" over a couple of hours and monitor the connectivity of the box from another box and see if it comes and goes. A few years ago I ran into some weird problems with if_dc cards interacting poorly with several linksys switches. I found that they saw intermittent connectivity, and autonegotiation worked poorly between them and that particular line of switches. It could be that you're experiencing something similar. > I replaced the NIC, yet the problem curiously persists. Did you replace it with another if_dc card, or with a card using a different interface driver? Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research