From owner-freebsd-fs@FreeBSD.ORG Sun Jun 14 13:26:24 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3A0A29A for ; Sun, 14 Jun 2015 13:26:24 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id D022A39D for ; Sun, 14 Jun 2015 13:26:23 +0000 (UTC) (envelope-from juergen.gotteswinter@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 261821472003; Sun, 14 Jun 2015 15:26:20 +0200 (CEST) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8t9kScVikemo; Sun, 14 Jun 2015 15:26:17 +0200 (CEST) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 8C3481472002; Sun, 14 Jun 2015 15:26:17 +0200 (CEST) Message-ID: <557D80F8.9000505@internetx.com> Date: Sun, 14 Jun 2015 15:26:16 +0200 From: InterNetX - Juergen Gotteswinter Reply-To: jg@internetx.com Organization: InterNetX GmbH User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: =?UTF-8?B?S2FybGkgU2rDtmJlcmc=?= , Andreas Nilsson , "freebsd-fs@freebsd.org" Subject: Re: [Fwd: Strange networking behaviour in storage server] References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <1433149349.14998.181.camel@data-b104.adm.slu.se> <20150613093117.GB37870@brick.home> <557D8092.7050301@internetx.com> In-Reply-To: <557D8092.7050301@internetx.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Jun 2015 13:26:24 -0000 Am 14.06.2015 um 15:24 schrieb InterNetX - Juergen Gotteswinter: > > > Am 13.06.2015 um 11:31 schrieb Edward Tomasz Napierała: >> On 0601T0902, Karli Sjöberg wrote: >>> mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: >>>> >>>> >>>> On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg >>>> wrote: >>>> -------- Vidarebefordrat meddelande -------- >>>> > Från: Karli Sjöberg >>>> > Till: freebsd-fs@freebsd.org >>>> > Ämne: Strange networking behaviour in storage server >>>> > Datum: Mon, 1 Jun 2015 07:49:56 +0000 >>>> > >>>> > Hey! >>>> > >>>> > So we have this ZFS storage server upgraded from 9.3-RELEASE >>>> to >>>> > 10.1-STABLE to overcome not being able to 1) use SSD drives >>>> as >>>> > L2ARC[1] >>>> > and 2) not being able to hotswap SATA drives[2]. >>>> > >>>> > After the upgrade we´ve noticed a very odd networking >>>> behaviour, it >>>> > sends/receives full speed for a while, then there is a >>>> couple of >>>> > minutes >>>> > of complete silence where even terminal commands like an >>>> "ls" just >>>> > waits >>>> > until they are executed and then it starts sending full >>>> speed again. I >>>> > ´ve linked to a screenshot showing this send and pause >>>> behaviour. The >>>> > blue line is the total, green is SMB and turquoise is NFS >>>> over jumbo >>>> > frames. It behaves this way regardless of the protocol. >>>> > >>>> > http://oi62.tinypic.com/33xvjb6.jpg >>>> > >>>> > The problem is that these pauses can sometimes be so long >>>> that >>>> > connections drop. Like someone is copying files over SMB or >>>> iSCSI and >>>> > suddenly they get an error message saying that the transfer >>>> failed and >>>> > they have to start over with the file(s). That´s horrible! >>>> > >>>> > So far NFS has proven to be the most resillient, it´s stupid >>>> simple >>>> > nature just waits and resumes transfer when pause is over. >>>> Kudus for >>>> > that. >>>> > >>>> > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 >>>> and 64GB >>>> > ECC >>>> > RAM. The hardware has been ruled out, we happened to have a >>>> identical >>>> > MB >>>> > and CPU lying around and that didn´t improve things. We have >>>> also >>>> > installed a Intel PRO 100/1000 Quad-port ethernet adapter to >>>> test if >>>> > that would change things, but it hasn´t, it still behaves >>>> this way. >>>> > >>>> > The two built-in NIC's are Intel 82574L and the Quad-port >>>> NIC's are >>>> > Intel 82571EB, so both em(4) driven. I happen to know that >>>> the em >>>> > driver >>>> > has updated between 9.3 and 10.1. Perhaps that is to blame, >>>> but I have >>>> > no idea. >>>> > >>>> > Is there anyone that can make sense of this? >>>> > >>>> > [1]: >>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 >>>> > >>>> > [2]: >>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 >>>> > >>>> > /K >>>> > >>>> > >>>> >>>> >>>> Another observation I´ve made is that during these pauses, the >>>> entire >>>> system is put on hold, even ZFS scrub stops and then resumes >>>> after a >>>> while. Looking in top, the system is completly idle. >>>> >>>> Normally during scrub, the kernel eats 20-30% CPU, but during >>>> a pause, >>>> even the [kernel] goes down to 0.00%. Makes me think the >>>> networking has >>>> nothing to do with it. >>>> >>>> What´s then to blame? ZFS? >>>> >>>> /K >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org" >>>> >>>> >>>> Hello, >>>> >>>> >>>> does this happen when clients are only reading from server? >>> >>> Yes it happens when clients are only reading from the server. >>> >>>> Otherwise I would suspect that it could be caused by ZFS writing out a >>>> large chunck of data sitting in its caches, and until that is complete >>>> I/O is stalled. >>> >>> That´s what so strange, we have three more systems set up about the same >>> size and none of others are acting this way. >>> >>> The only thing I can think of that differs that we haven´t tested ruling >>> out yet is ctld, the other systems are still running istgt as their >>> iSCSI daemon. >> >> So, were you able to rule out ctld? >> >> Do you have local, or terminal, access to the machine? When the problem >> manifests, do local commands work? In other words, is the whole machine >> wedged, or just the network? If it's just the network, it might be >> caused by ctld consuming all available mbufs. You could run "netstat -m" >> before and after to check that. >> > > You already checked (doublechecked) HBA Firmware etc? Cabling is fine? > > I expect you already disabled tso, gro, rxcsum, txcsum on your NIC(s). I > had similar effects, with all those fancy uberfeatures enabled. > > Give it a try... ifconfig foo0 -rxcsum -txcsum -tso -gro > > Capturing a few MB of Traffic before/after could be also very helpful to > see if... > errm, sorry. Forgot something... how does your Network Setup look like? Link Aggregations? Which Switches, which Linespeed, Stacked or not? Any Drops / Errors on your Interfaces? > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>