Date: Sun, 14 Jun 2015 15:24:34 +0200 From: InterNetX - Juergen Gotteswinter <jg@internetx.com> To: =?UTF-8?B?S2FybGkgU2rDtmJlcmc=?= <karli.sjoberg@slu.se>, Andreas Nilsson <andrnils@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <557D8092.7050301@internetx.com> In-Reply-To: <20150613093117.GB37870@brick.home> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se> <20150613093117.GB37870@brick.home>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 13.06.2015 um 11:31 schrieb Edward Tomasz Napierała: > On 0601T0902, Karli Sjöberg wrote: >> mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: >>> >>> >>> On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg <karli.sjoberg@slu.se> >>> wrote: >>> -------- Vidarebefordrat meddelande -------- >>> > Från: Karli Sjöberg <karli.sjoberg@slu.se> >>> > Till: freebsd-fs@freebsd.org <freebsd-fs@freebsd.org> >>> > Ämne: Strange networking behaviour in storage server >>> > Datum: Mon, 1 Jun 2015 07:49:56 +0000 >>> > >>> > Hey! >>> > >>> > So we have this ZFS storage server upgraded from 9.3-RELEASE >>> to >>> > 10.1-STABLE to overcome not being able to 1) use SSD drives >>> as >>> > L2ARC[1] >>> > and 2) not being able to hotswap SATA drives[2]. >>> > >>> > After the upgrade we´ve noticed a very odd networking >>> behaviour, it >>> > sends/receives full speed for a while, then there is a >>> couple of >>> > minutes >>> > of complete silence where even terminal commands like an >>> "ls" just >>> > waits >>> > until they are executed and then it starts sending full >>> speed again. I >>> > ´ve linked to a screenshot showing this send and pause >>> behaviour. The >>> > blue line is the total, green is SMB and turquoise is NFS >>> over jumbo >>> > frames. It behaves this way regardless of the protocol. >>> > >>> > http://oi62.tinypic.com/33xvjb6.jpg >>> > >>> > The problem is that these pauses can sometimes be so long >>> that >>> > connections drop. Like someone is copying files over SMB or >>> iSCSI and >>> > suddenly they get an error message saying that the transfer >>> failed and >>> > they have to start over with the file(s). That´s horrible! >>> > >>> > So far NFS has proven to be the most resillient, it´s stupid >>> simple >>> > nature just waits and resumes transfer when pause is over. >>> Kudus for >>> > that. >>> > >>> > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 >>> and 64GB >>> > ECC >>> > RAM. The hardware has been ruled out, we happened to have a >>> identical >>> > MB >>> > and CPU lying around and that didn´t improve things. We have >>> also >>> > installed a Intel PRO 100/1000 Quad-port ethernet adapter to >>> test if >>> > that would change things, but it hasn´t, it still behaves >>> this way. >>> > >>> > The two built-in NIC's are Intel 82574L and the Quad-port >>> NIC's are >>> > Intel 82571EB, so both em(4) driven. I happen to know that >>> the em >>> > driver >>> > has updated between 9.3 and 10.1. Perhaps that is to blame, >>> but I have >>> > no idea. >>> > >>> > Is there anyone that can make sense of this? >>> > >>> > [1]: >>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 >>> > >>> > [2]: >>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 >>> > >>> > /K >>> > >>> > >>> >>> >>> Another observation I´ve made is that during these pauses, the >>> entire >>> system is put on hold, even ZFS scrub stops and then resumes >>> after a >>> while. Looking in top, the system is completly idle. >>> >>> Normally during scrub, the kernel eats 20-30% CPU, but during >>> a pause, >>> even the [kernel] goes down to 0.00%. Makes me think the >>> networking has >>> nothing to do with it. >>> >>> What´s then to blame? ZFS? >>> >>> /K >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to >>> "freebsd-fs-unsubscribe@freebsd.org" >>> >>> >>> Hello, >>> >>> >>> does this happen when clients are only reading from server? >> >> Yes it happens when clients are only reading from the server. >> >>> Otherwise I would suspect that it could be caused by ZFS writing out a >>> large chunck of data sitting in its caches, and until that is complete >>> I/O is stalled. >> >> That´s what so strange, we have three more systems set up about the same >> size and none of others are acting this way. >> >> The only thing I can think of that differs that we haven´t tested ruling >> out yet is ctld, the other systems are still running istgt as their >> iSCSI daemon. > > So, were you able to rule out ctld? > > Do you have local, or terminal, access to the machine? When the problem > manifests, do local commands work? In other words, is the whole machine > wedged, or just the network? If it's just the network, it might be > caused by ctld consuming all available mbufs. You could run "netstat -m" > before and after to check that. > You already checked (doublechecked) HBA Firmware etc? Cabling is fine? I expect you already disabled tso, gro, rxcsum, txcsum on your NIC(s). I had similar effects, with all those fancy uberfeatures enabled. Give it a try... ifconfig foo0 -rxcsum -txcsum -tso -gro Capturing a few MB of Traffic before/after could be also very helpful to see if... > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?557D8092.7050301>