From owner-freebsd-fs@FreeBSD.ORG Mon Jun 1 08:05:29 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25491C1A for ; Mon, 1 Jun 2015 08:05:29 +0000 (UTC) (envelope-from karli.sjoberg@slu.se) Received: from EXCH2-1.slu.se (webmail.slu.se [77.235.224.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "webmail.slu.se", Issuer "TERENA SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AAF6B15C2 for ; Mon, 1 Jun 2015 08:05:28 +0000 (UTC) (envelope-from karli.sjoberg@slu.se) Received: from exch2-4.slu.se (77.235.224.124) by EXCH2-1.slu.se (77.235.224.121) with Microsoft SMTP Server (TLS) id 15.0.1076.9; Mon, 1 Jun 2015 09:49:57 +0200 Received: from exch2-4.slu.se ([fe80::3117:818f:aa48:9d9b]) by exch2-4.slu.se ([fe80::3117:818f:aa48:9d9b%22]) with mapi id 15.00.1076.000; Mon, 1 Jun 2015 09:49:57 +0200 From: =?iso-8859-1?Q?Karli_Sj=F6berg?= To: "freebsd-fs@freebsd.org" Subject: Strange networking behaviour in storage server Thread-Topic: Strange networking behaviour in storage server Thread-Index: AQHQnD9NwzSA7ysFDUSHMhgmIK6yWw== Date: Mon, 1 Jun 2015 07:49:56 +0000 Message-ID: <1433145014066.28625@slu.se> Accept-Language: sv-SE, en-US Content-Language: sv-SE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [77.235.228.32] MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jun 2015 08:05:29 -0000 Hey! So we have this ZFS storage server upgraded from 9.3-RELEASE to 10.1-STABLE to overcome not being able to 1) use SSD drives as L2ARC[1] and 2) not being able to hotswap SATA drives[2]. After the upgrade we=B4ve noticed a very odd networking behaviour, it sends/receives full speed for a while, then there is a couple of minutes of complete silence where even terminal commands like an "ls" just waits until they are executed and then it starts sending full speed again. I =B4ve linked to a screenshot showing this send and pause behaviour. The blue line is the total, green is SMB and turquoise is NFS over jumbo frames. It behaves this way regardless of the protocol. http://oi62.tinypic.com/33xvjb6.jpg The problem is that these pauses can sometimes be so long that connections drop. Like someone is copying files over SMB or iSCSI and suddenly they get an error message saying that the transfer failed and they have to start over with the file(s). That=B4s horrible! So far NFS has proven to be the most resillient, it=B4s stupid simple nature just waits and resumes transfer when pause is over. Kudus for that. The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 and 64GB ECC RAM. The hardware has been ruled out, we happened to have a identical MB and CPU lying around and that didn=B4t improve things. We have also installed a Intel PRO 100/1000 Quad-port ethernet adapter to test if that would change things, but it hasn=B4t, it still behaves this way. The two built-in NIC's are Intel 82574L and the Quad-port NIC's are Intel 82571EB, so both em(4) driven. I happen to know that the em driver has updated between 9.3 and 10.1. Perhaps that is to blame, but I have no idea. Is there anyone that can make sense of this? [1]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D197164 [2]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348 /K