From owner-freebsd-stable@FreeBSD.ORG Wed Jul 2 17:16:14 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 62B6ABD6; Wed, 2 Jul 2014 17:16:14 +0000 (UTC) Received: from smtp10.server.rpi.edu (gateway.canit.rpi.edu [128.113.2.230]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0978B2C24; Wed, 2 Jul 2014 17:16:13 +0000 (UTC) Received: from smtp-auth1.server.rpi.edu (smtp-auth1.server.rpi.edu [128.113.2.231]) by smtp10.server.rpi.edu (8.14.3/8.14.3/Debian-9.4) with ESMTP id s62HCl9c024285; Wed, 2 Jul 2014 13:12:47 -0400 Received: from smtp-auth1.server.rpi.edu (localhost [127.0.0.1]) by smtp-auth1.server.rpi.edu (Postfix) with ESMTP id 567B15818F; Wed, 2 Jul 2014 13:12:47 -0400 (EDT) Received: from [129.161.63.77] (biotech-upper-wl-318.dynamic2.rpi.edu [129.161.63.77]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: healer) by smtp-auth1.server.rpi.edu (Postfix) with ESMTPSA id 2DDDB58194; Wed, 2 Jul 2014 13:12:47 -0400 (EDT) Message-ID: <53B43D90.6000700@rpi.edu> Date: Wed, 02 Jul 2014 13:12:48 -0400 From: Bob Healey User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: Interactions with mxge, pf, nfsd, and the kernel References: <53B42139.302@rpi.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-Bayes-Prob: 0.005 (Score 0, tokens from: outgoing, @@RPTN) X-Spam-Score: 0.00 () [Hold at 15.10] X-CanIt-Incident-Id: 03MltcLk7 X-CanIt-Geo: ip=129.161.63.77; country=US; region=Connecticut; city=Hartford; latitude=41.7637; longitude=-72.6851; http://maps.google.com/maps?q=41.7637,-72.6851&z=6 X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.230 Cc: FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jul 2014 17:16:14 -0000 At the moment, I am running as patched as freebsd-update made me on 6/12/14 Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium healer@rpi.edu (518) 276-4407 On 7/2/2014 12:59 PM, Adrian Chadd wrote: > Hi, > > I vaguely recall some pf issues that caused the state table to not get > flushed and things to get stuck. I think it fixed post 10.0-REL. > > Maybe update to 10-STABLE and see? > > > -a > > > On 2 July 2014 08:11, Bob Healey wrote: >> Hello. >> >> I've been wrestling with this on and off for a few months now. I have an >> assortment of systems (some Dell Poweredge R515, R610, and IBM x3630M3) with >> 10 gig Myricom ethernet cards acting as nfs servers to Linux HPC compute >> clusters (12-36 nodes, 384 - 480 cores) connected via gigabit ethernet. >> They are also connected to the outside world via onboard bce (Dell) or igb >> (IBM). After a variable length of time, I will lose all network access to a >> host. Connecting via console, the machine tends to be fully responsive. A >> reboot clears the problem, but I have yet to figure out any >> sysctls/loader.conf tunables to clear the problem and make it stay away. PF >> is in use to restrict access to the host to a pair of public /24's, and to >> 10/8. If there is a way in zfs's sharenfs property to make that >> restriction, I'd be happy to change, but I really don't like leaving nfs >> open to the university's quartet of /16's, so PF it is. The vlan2 interface >> has mxge0 as its parent. >> >> Thanks for any help. >> >> This host is getting ready to crash soon, based on netstat. >> root@husker:~ # netstat -i >> Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs >> Coll >> mxge0 9000 00:60:dd:44:d2:0a 6358280 262 0 4061637 0 >> 0 >> mxge0 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 2 - >> - >> bce0 1500 08:9e:01:50:a1:ac 276391 0 0 0 0 >> 0 >> bce0 1500 fe80::a9e:1ff fe80::a9e:1ff:fe5 0 - - 3 - >> - >> bce1 1500 08:9e:01:50:a1:ad 2229709391 16921 0 >> 1182942116 0 0 >> bce1 1500 128.113.12.0 husker 2226254093 - - >> 1183962005 - - >> bce1 1500 fe80::a9e:1ff fe80::a9e:1ff:fe5 0 - - 3 - >> - >> lo0 16384 2030 0 0 2030 0 >> 0 >> lo0 16384 localhost ::1 4 - - 4 - >> - >> lo0 16384 fe80::1%lo0 fe80::1 0 - - 0 - >> - >> lo0 16384 your-net localhost 2026 - - 2026 - >> - >> vlan2 9000 00:60:dd:44:d2:0a 4387250 0 0 3060586 0 >> 0 >> vlan2 9000 10.2.3.0 husker.galactica. 4370309 - - 3963931 >> - - >> vlan2 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 2 - >> - >> vlan2 9000 00:60:dd:44:d2:0a 1971034 0 0 1001061 0 >> 0 >> vlan2 9000 10.2.4.0 husker.enterprise 1700742 - - 1961891 >> - - >> vlan2 9000 fe80::260:ddf fe80::260:ddff:fe 0 - - 4 - >> - >> root@husker:~ # netstat -im >> 6157/3233/9390 mbufs in use (current/cache/total) >> 4081/1883/5964/1018800 mbuf clusters in use (current/cache/total/max) >> 4080/795 mbuf+clusters out of packet secondary zone in use (current/cache) >> 0/5/5/509399 4k (page size) jumbo clusters in use (current/cache/total/max) >> 512/23/535/150933 9k jumbo clusters in use (current/cache/total/max) >> 0/0/0/84899 16k jumbo clusters in use (current/cache/total/max) >> 14309K/4801K/19110K bytes allocated to network (current/cache/total) >> 10/1883/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 2/1736/0 requests for jumbo clusters denied (4k/9k/16k) >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >> root@husker:~ # uptime >> 11:07AM up 23 days, 19:27, 1 user, load averages: 0.14, 0.17, 0.13 >> root@husker:~ # sysctl -a | grep nmb >> kern.ipc.nmbclusters: 1018800 >> kern.ipc.nmbjumbop: 509399 >> kern.ipc.nmbjumbo9: 452799 >> kern.ipc.nmbjumbo16: 339596 >> kern.ipc.nmbufs: 6520320 >> root@husker:~ # cat /boot/loader.conf >> zfs_load="YES" >> amdtemp_load="YES" >> if_mxge_load="YES" >> mxge_ethp_z8e_load="YES" >> mxge_eth_z8e_load="YES" >> mxge_rss_ethp_z8e_load="YES" >> mxge_rss_eth_z8e_load="YES" >> vfs.zfs.arc_max="12288M" >> root@husker:~ # cat /var/run/dmesg.boot | head -16 >> Copyright (c) 1992-2014 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 >> The Regents of the University of California. All rights reserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 10.0-RELEASE-p4 #0: Tue Jun 3 13:14:57 UTC 2014 >> root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 >> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 >> CPU: AMD Opteron(tm) Processor 4122 (2200.07-MHz K8-class CPU) >> Origin = "AuthenticAMD" Id = 0x100f80 Family = 0x10 Model = 0x8 >> Stepping = 0 >> Features=0x178bfbff >> Features2=0x802009 >> AMD >> Features=0xee500800 >> AMD >> Features2=0x837ff >> TSC: P-state invariant >> real memory = 17179869184 (16384 MB) >> avail memory = 16588054528 (15819 MB) >> >> >> -- >> Bob Healey >> Systems Administrator >> Biocomputation and Bioinformatics Constellation >> and Molecularium >> healer@rpi.edu >> (518) 276-4407 >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"