From owner-freebsd-fs@FreeBSD.ORG Sat Jun 13 09:31:22 2015 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8EA08BC6 for ; Sat, 13 Jun 2015 09:31:22 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1E8A8688 for ; Sat, 13 Jun 2015 09:31:22 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by wifx6 with SMTP id x6so34744810wif.0 for ; Sat, 13 Jun 2015 02:31:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=1tdp4q/GpV+zPZt1zN61+43Ngs3PoNiReS52EPFQ22o=; b=pMb9mU0Xl5Po22MYL4+Mu50kvUOT2N4lZDcGBnI2c4qeuDNqyMNRCK6xLz8ZbcMEsF jnNuEBR6HNZidSBYyL3fKi/DSQ1hepfsvT5VnZOnNdTxz26HsNnI03UEkwaBNO/XHJz4 aa4zkg0Azt4UqJB2jWb/reFH9DWVi41nyAfsWhDHlR98ii66va/2rfOzTiqfN/sotF9m YqJR3ZK5iOcTlD6cwaqHBelsthye+R/AEhppdovJyinFwYt/NO8n7xevqw7ME00GzKfi uIM2gkN5RfKUrJs9Q9KPp4ZY0i7gOvR9Stgi3ccGapgYD1klyYhJgcu3C7Bi/lIERGjk H/yQ== X-Received: by 10.194.175.65 with SMTP id by1mr34651361wjc.152.1434187880573; Sat, 13 Jun 2015 02:31:20 -0700 (PDT) Received: from brick.home (eug18.neoplus.adsl.tpnet.pl. [83.20.178.18]) by mx.google.com with ESMTPSA id ha4sm6482236wib.0.2015.06.13.02.31.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Jun 2015 02:31:19 -0700 (PDT) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Date: Sat, 13 Jun 2015 11:31:17 +0200 From: Edward Tomasz =?utf-8?Q?Napiera=C5=82a?= To: Karli =?iso-8859-1?Q?Sj=F6berg?= Cc: Andreas Nilsson , "freebsd-fs@freebsd.org" Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <20150613093117.GB37870@brick.home> Mail-Followup-To: Karli =?iso-8859-1?Q?Sj=F6berg?= , Andreas Nilsson , "freebsd-fs@freebsd.org" References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <1433149349.14998.181.camel@data-b104.adm.slu.se> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1433149349.14998.181.camel@data-b104.adm.slu.se> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Jun 2015 09:31:22 -0000 On 0601T0902, Karli Sjöberg wrote: > mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: > > > > > > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg > > wrote: > > -------- Vidarebefordrat meddelande -------- > > > Från: Karli Sjöberg > > > Till: freebsd-fs@freebsd.org > > > Ämne: Strange networking behaviour in storage server > > > Datum: Mon, 1 Jun 2015 07:49:56 +0000 > > > > > > Hey! > > > > > > So we have this ZFS storage server upgraded from 9.3-RELEASE > > to > > > 10.1-STABLE to overcome not being able to 1) use SSD drives > > as > > > L2ARC[1] > > > and 2) not being able to hotswap SATA drives[2]. > > > > > > After the upgrade we´ve noticed a very odd networking > > behaviour, it > > > sends/receives full speed for a while, then there is a > > couple of > > > minutes > > > of complete silence where even terminal commands like an > > "ls" just > > > waits > > > until they are executed and then it starts sending full > > speed again. I > > > ´ve linked to a screenshot showing this send and pause > > behaviour. The > > > blue line is the total, green is SMB and turquoise is NFS > > over jumbo > > > frames. It behaves this way regardless of the protocol. > > > > > > http://oi62.tinypic.com/33xvjb6.jpg > > > > > > The problem is that these pauses can sometimes be so long > > that > > > connections drop. Like someone is copying files over SMB or > > iSCSI and > > > suddenly they get an error message saying that the transfer > > failed and > > > they have to start over with the file(s). That´s horrible! > > > > > > So far NFS has proven to be the most resillient, it´s stupid > > simple > > > nature just waits and resumes transfer when pause is over. > > Kudus for > > > that. > > > > > > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 > > and 64GB > > > ECC > > > RAM. The hardware has been ruled out, we happened to have a > > identical > > > MB > > > and CPU lying around and that didn´t improve things. We have > > also > > > installed a Intel PRO 100/1000 Quad-port ethernet adapter to > > test if > > > that would change things, but it hasn´t, it still behaves > > this way. > > > > > > The two built-in NIC's are Intel 82574L and the Quad-port > > NIC's are > > > Intel 82571EB, so both em(4) driven. I happen to know that > > the em > > > driver > > > has updated between 9.3 and 10.1. Perhaps that is to blame, > > but I have > > > no idea. > > > > > > Is there anyone that can make sense of this? > > > > > > [1]: > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 > > > > > > [2]: > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 > > > > > > /K > > > > > > > > > > > > Another observation I´ve made is that during these pauses, the > > entire > > system is put on hold, even ZFS scrub stops and then resumes > > after a > > while. Looking in top, the system is completly idle. > > > > Normally during scrub, the kernel eats 20-30% CPU, but during > > a pause, > > even the [kernel] goes down to 0.00%. Makes me think the > > networking has > > nothing to do with it. > > > > What´s then to blame? ZFS? > > > > /K > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > Hello, > > > > > > does this happen when clients are only reading from server? > > Yes it happens when clients are only reading from the server. > > > Otherwise I would suspect that it could be caused by ZFS writing out a > > large chunck of data sitting in its caches, and until that is complete > > I/O is stalled. > > That´s what so strange, we have three more systems set up about the same > size and none of others are acting this way. > > The only thing I can think of that differs that we haven´t tested ruling > out yet is ctld, the other systems are still running istgt as their > iSCSI daemon. So, were you able to rule out ctld? Do you have local, or terminal, access to the machine? When the problem manifests, do local commands work? In other words, is the whole machine wedged, or just the network? If it's just the network, it might be caused by ctld consuming all available mbufs. You could run "netstat -m" before and after to check that.