From owner-freebsd-fs@FreeBSD.ORG Mon Jun 1 09:56:06 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A94B67EF for ; Mon, 1 Jun 2015 09:56:06 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6E907106E for ; Mon, 1 Jun 2015 09:56:06 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: by igbjd9 with SMTP id jd9so57012697igb.1 for ; Mon, 01 Jun 2015 02:56:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/POQGUQdVnvh5dAvnaB39ykhX14DNEA1GbeFEXH0nT0=; b=BqUQSs3zdHHZC5Ntx5FhouEaT8NYcD/SM4erQQOtxauW2PEsQVdV9BuCZpLV7w1hFn Bk6D8PfDmNnuQyURgxPLOVqdUQMcFhaBjjI37DnzGmK+PXe0CvA8tczje72CZe3/eQVl 6ElbPLXthawuBno7RCynXOgEAAXxyVB8UUBRrLSuivBzKXe262ahxGCWpz+ZOaXK7+RG gI4Z6a8SlpZnaNaKTch7Nr0/EK/2n//OEcQcNikoja98JXrN/SoyPmNj1DUndlCOkgt5 j0wAyEjzhjG6SwSXRizEQ2hiAWkJHxNZNyJhrPYVPtibFaen55XrxY0Hm0btITo82MX7 tZUA== MIME-Version: 1.0 X-Received: by 10.50.178.230 with SMTP id db6mr12264249igc.26.1433152565804; Mon, 01 Jun 2015 02:56:05 -0700 (PDT) Received: by 10.65.15.33 with HTTP; Mon, 1 Jun 2015 02:56:05 -0700 (PDT) In-Reply-To: <1433149349.14998.181.camel@data-b104.adm.slu.se> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <1433149349.14998.181.camel@data-b104.adm.slu.se> Date: Mon, 1 Jun 2015 02:56:05 -0700 Message-ID: Subject: Re: [Fwd: Strange networking behaviour in storage server] From: Mehmet Erol Sanliturk To: =?UTF-8?Q?Karli_Sj=C3=B6berg?= Cc: Andreas Nilsson , "freebsd-fs@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jun 2015 09:56:06 -0000 On Mon, Jun 1, 2015 at 2:02 AM, Karli Sj=C3=B6berg w= rote: > m=C3=A5n 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: > > > > > > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sj=C3=B6berg > > wrote: > > -------- Vidarebefordrat meddelande -------- > > > Fr=C3=A5n: Karli Sj=C3=B6berg > > > Till: freebsd-fs@freebsd.org > > > =C3=84mne: Strange networking behaviour in storage server > > > Datum: Mon, 1 Jun 2015 07:49:56 +0000 > > > > > > Hey! > > > > > > So we have this ZFS storage server upgraded from 9.3-RELEASE > > to > > > 10.1-STABLE to overcome not being able to 1) use SSD drives > > as > > > L2ARC[1] > > > and 2) not being able to hotswap SATA drives[2]. > > > > > > After the upgrade we=C2=B4ve noticed a very odd networking > > behaviour, it > > > sends/receives full speed for a while, then there is a > > couple of > > > minutes > > > of complete silence where even terminal commands like an > > "ls" just > > > waits > > > until they are executed and then it starts sending full > > speed again. I > > > =C2=B4ve linked to a screenshot showing this send and pause > > behaviour. The > > > blue line is the total, green is SMB and turquoise is NFS > > over jumbo > > > frames. It behaves this way regardless of the protocol. > > > > > > http://oi62.tinypic.com/33xvjb6.jpg > > > > > > The problem is that these pauses can sometimes be so long > > that > > > connections drop. Like someone is copying files over SMB or > > iSCSI and > > > suddenly they get an error message saying that the transfer > > failed and > > > they have to start over with the file(s). That=C2=B4s horribl= e! > > > > > > So far NFS has proven to be the most resillient, it=C2=B4s st= upid > > simple > > > nature just waits and resumes transfer when pause is over. > > Kudus for > > > that. > > > > > > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 > > and 64GB > > > ECC > > > RAM. The hardware has been ruled out, we happened to have a > > identical > > > MB > > > and CPU lying around and that didn=C2=B4t improve things. We = have > > also > > > installed a Intel PRO 100/1000 Quad-port ethernet adapter to > > test if > > > that would change things, but it hasn=C2=B4t, it still behave= s > > this way. > > > > > > The two built-in NIC's are Intel 82574L and the Quad-port > > NIC's are > > > Intel 82571EB, so both em(4) driven. I happen to know that > > the em > > > driver > > > has updated between 9.3 and 10.1. Perhaps that is to blame, > > but I have > > > no idea. > > > > > > Is there anyone that can make sense of this? > > > > > > [1]: > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D197164 > > > > > > [2]: > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348 > > > > > > /K > > > > > > > > > > > > Another observation I=C2=B4ve made is that during these pauses,= the > > entire > > system is put on hold, even ZFS scrub stops and then resumes > > after a > > while. Looking in top, the system is completly idle. > > > > Normally during scrub, the kernel eats 20-30% CPU, but during > > a pause, > > even the [kernel] goes down to 0.00%. Makes me think the > > networking has > > nothing to do with it. > > > > What=C2=B4s then to blame? ZFS? > > > > /K > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > Hello, > > > > > > does this happen when clients are only reading from server? > > Yes it happens when clients are only reading from the server. > > > Otherwise I would suspect that it could be caused by ZFS writing out a > > large chunck of data sitting in its caches, and until that is complete > > I/O is stalled. > > That=C2=B4s what so strange, we have three more systems set up about the = same > size and none of others are acting this way. > > The only thing I can think of that differs that we haven=C2=B4t tested ru= ling > out yet is ctld, the other systems are still running istgt as their > iSCSI daemon. > > /K > > If there are other three similar systems and they are exactly installed with the same structure , my first possibility to consider would be to suspect a slowly progressing hardware failure : >From a circuit , it is not possible to get a response in expected time , but , it is responding after a time which is not normal . Such an action may be caused by a faulty soldered or cracked line point in the circuit : When it is hot , it is disconnecting , when it is cold it is connecting . Thank you very much . Mehmet Erol Sanliturk > > > > > > Have you tried what is suggested in > > https://wiki.freebsd.org/ZFSTuningGuide ? In particular setting > > vfs.zfs.write_limit_override to something appropriate for your site. > > The timeout seems to be defaulting to 5 now. > > > > > > Best regards > > > > Andreas > > > > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"