From owner-freebsd-fs@FreeBSD.ORG Fri Jul 12 19:38:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 75EE46BD for ; Fri, 12 Jul 2013 19:38:57 +0000 (UTC) (envelope-from bofh@terranova.net) Received: from tog.net (tog.net [IPv6:2605:5a00::5]) by mx1.freebsd.org (Postfix) with ESMTP id 3E64C1B61 for ; Fri, 12 Jul 2013 19:38:57 +0000 (UTC) Received: from [IPv6:2605:5a00:ffff::face] (unknown [IPv6:2605:5a00:ffff::face]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tog.net (Postfix) with ESMTPSA id 3bsPZl1xg6z5bH; Fri, 12 Jul 2013 15:38:55 -0400 (EDT) Message-ID: <51E05B48.60607@terranova.net> Date: Fri, 12 Jul 2013 15:38:48 -0400 From: Travis Mikalson Organization: TerraNovaNet Internet Services User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Bob Healey , freebsd-fs@freebsd.org Subject: Re: Massive Problems with 10G, NFS, ZFS, and iSCSI References: <51E032B5.9080705@rpi.edu> In-Reply-To: <51E032B5.9080705@rpi.edu> X-Enigmail-Version: 0.96.0 OpenPGP: url=http://www.terranova.net/pgp/bofh Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jul 2013 19:38:57 -0000 Bob Healey wrote: > I've been beating my head against a brick wall for a week with this and > 5 similar systems. > > My current major headache: > Dell Poweredge R610, dual quad core Xeon E5530 @ 2.4GHz, 24GB RAM 4 > onboard bce NICs, 1 mxge NIC, pair of 10K SAS drives on mpt (Dell MB SAS > controller), pair of 15 drive 1TB RAID 6 arrays on mfi (PERC 6). > > The machine was originally installed with FreeBSD 7.2 and has been > upgraded through the years to 9.1. None of the issues I'm currently > seeing manifested themselves under 9.0. When under heavy NFS load, the > server currently becomes non-responsive on the network, unless the > packet payload is very small (ICMP ping packets with > 124 bytes payload > get dropped). > > Current network config: > bce0: management network, connected to the 37 IPMI controllers in the > rack, has conserver running SOL connections to each > bce1: link to outside world, everything in rack trying to reach outside > is NATed through here > bce2: used for a direct host to host ISCSI link to another host in the > rack to provide a hard drive for a virtual machine. This machine is the > iscsi target, and an 80GB zvol is the backing store. > mxge0/vlan1: connected to first 25 machines in rack > mxge0/vlan2: connected to remaining 12 machines in rack, plus a vm on > host #25 on vlan 1 > > This is an HPC cluster, with all nodes running RHEL 5. The landing pads > (1 real, 1 virtual) are multihomed to both the internal and external > networks, so the only traffic that crosses the NAT is software updates > and job accounting information. > > PF is used for firewalling and NAT. skip is enabled on all internal > interfaces. I have zero experience with mxge NICs, and I expect others will have a lot more to say, but the first thing I'd try in your shoes is complete removal of pf from your kernel. Try replacing it with ipfw and see if it helps any. Pf is generally not recommended above 1Gbit due to it still working under a single mutex. I'm linking this for purposes of describing pf's current performance limitations, not for the rest of the content of the post: http://forum.pfsense.org/index.php?topic=50812.0;wap2 > Stuff I've tried: setting vfs.zfs.arc_max="20480M", disabling flow > control on the 10G NIC, moving the ZIL to some unused space on the boot > drive (RAID 1, mostly UFS). > > I'm getting lots of Limiting open port RST response from 32325 to 200 > packets/sec in the logs, ISCSI timeouts on the client, and NFS server > not responding errors. netstat -i is showing lots of input errors on > mxge, but i'm not seeing any errors on the switch (Dell Powerconnect > 6248). Myricom (nic vendor) is at a loss too. > > Any ideas on what I should try next? I'm at the point of throwing darts > blindfolded. > > I've got 5 more similar misbehaving machines, 4 of which behave just > fine when using igb instead of mxge. Again, I have no experience with mxge good or bad and I wouldn't rule out the possibility of mxge driver performance either not being up to snuff or requiring tuning. Another thing that comes to mind that you haven't mentioned, have you tuned your mbuf clusters upwards from default? My /boot/loader.conf just for a loaded box with only gigabit NICs adjusts things upwards like so: kern.ipc.nmbclusters="262144" kern.ipc.nmbjumbop="262144" kern.ipc.nmbjumbo16="32000" kern.ipc.nmbjumbo9="64000" netstat -m can give you some insight on your mbuf cluster usage, and would be especially interesting to see during one of these fits you've described. -- TerraNovaNet Internet Services - Key Largo, FL Voice: (305)453-4011 x101 Fax: (305)451-5991 http://www.terranova.net/ PGP: 50091B3D ---------------------------------------------- Life's not fair, but the root password helps.