From owner-freebsd-questions@freebsd.org Sun Sep 13 19:32:58 2015 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 740F4A032A3 for ; Sun, 13 Sep 2015 19:32:58 +0000 (UTC) (envelope-from healer@rpi.edu) Received: from smtp9.server.rpi.edu (gateway.canit.rpi.edu [128.113.2.229]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "canit.localdomain", Issuer "canit.localdomain" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F075D14D9 for ; Sun, 13 Sep 2015 19:32:54 +0000 (UTC) (envelope-from healer@rpi.edu) Received: from smtp-auth1.server.rpi.edu (route.canit.rpi.edu [128.113.2.231]) by smtp9.server.rpi.edu (8.14.3/8.14.3/Debian-9.4) with ESMTP id t8DJVFR6016790 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sun, 13 Sep 2015 15:31:15 -0400 Received: from smtp-auth1.server.rpi.edu (localhost [127.0.0.1]) by smtp-auth1.server.rpi.edu (Postfix) with ESMTP id 1D3145803C for ; Sun, 13 Sep 2015 15:31:15 -0400 (EDT) Received: from [128.113.209.244] (vpn-209-244.net.rpi.edu [128.113.209.244]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: healer) by smtp-auth1.server.rpi.edu (Postfix) with ESMTPSA id 069E458006 for ; Sun, 13 Sep 2015 15:31:14 -0400 (EDT) To: freebSD-questions@freebsd.org From: Bob Healey Subject: Problems with ZFS file servers Message-ID: <55F5CF06.5080602@rpi.edu> Date: Sun, 13 Sep 2015 15:31:18 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-Bayes-Prob: 0.0001 (Score 0, tokens from: outgoing, @@RPTN) X-Spam-Score: 0.00 () [Hold at 7.10] X-CanIt-Incident-Id: 02PgHvfVG X-CanIt-Geo: ip=128.113.209.244; country=US; region=New York; city=Troy; latitude=42.7495; longitude=-73.5951; http://maps.google.com/maps?q=42.7495,-73.5951&z=6 X-CanItPRO-Stream: outgoing X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.229 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Sep 2015 19:32:58 -0000 Hi. I've been semi-successfully running multi-homed ZFS based NFS file servers. Every 30-90 days I have to reboot them, or they become non-responsive on one or more interfaces. My only error messages are my RHEL 5 clients complaining the server is unreachable, and the output of netstat -i showing fast increasing input errors. I am running 10.1-RELEASE patched to 7/2/15. Installed ports are minimal, mainly bash, rsync, portupgrade, and their dependencies. Basic info: Variety of hosts, some Dell, some IBM, some HP, some Sun (pre Oracle), some Supermicro whitebox systems. Age ranges from 1 to 5 years old. Ram varies 6GB to 64GB, network cards are assorted onboard igb, em, and bge cards. Also have some mxge cards installed. Disk is mostly on mfi or mpt based controllers, with two cciss card. Raw disk capacity varies between 12TB and 96TB. CPUs vary from Xeon 54xx chips to Opteron 43xx chips and everything in between. I have some identical machines still on Oracle support running Solaris/ZFS that do not exhibit these problems under identical loads. The servers are used as NFS file stores to HPC research clusters. There is one interface reachable from the publicly routed university network, and a second interface with 802.1q vlans to reach each of the internal cluster networks a given host servers. Due to boss's rules regarding downtime (no scheduled outages ever, for any reason), the next time I know I'll be able to reboot these to test changes is 6/18/16 when the annual electrical shutdown occurs. Otherwise, I can try suggestions as things get unhappy with life and require unscheduled reboots. -- Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium healer@rpi.edu (518) 276-4407