Date: Sun, 14 Jun 2015 15:26:16 +0200 From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> To: =?UTF-8?B?S2FybGkgU2rDtmJlcmc=?= <karli.sjoberg@slu.se>, Andreas Nilsson <andrnils@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <557D80F8.9000505@internetx.com> In-Reply-To: <557D8092.7050301@internetx.com> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se> <20150613093117.GB37870@brick.home> <557D8092.7050301@internetx.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 14.06.2015 um 15:24 schrieb InterNetX - Juergen Gotteswinter: > > > Am 13.06.2015 um 11:31 schrieb Edward Tomasz Napierała: >> On 0601T0902, Karli Sjöberg wrote: >>> mån 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: >>>> >>>> >>>> On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg <karli.sjoberg@slu.se> >>>> wrote: >>>> -------- Vidarebefordrat meddelande -------- >>>> > Från: Karli Sjöberg <karli.sjoberg@slu.se> >>>> > Till: freebsd-fs@freebsd.org <freebsd-fs@freebsd.org> >>>> > Ämne: Strange networking behaviour in storage server >>>> > Datum: Mon, 1 Jun 2015 07:49:56 +0000 >>>> > >>>> > Hey! >>>> > >>>> > So we have this ZFS storage server upgraded from 9.3-RELEASE >>>> to >>>> > 10.1-STABLE to overcome not being able to 1) use SSD drives >>>> as >>>> > L2ARC[1] >>>> > and 2) not being able to hotswap SATA drives[2]. >>>> > >>>> > After the upgrade we´ve noticed a very odd networking >>>> behaviour, it >>>> > sends/receives full speed for a while, then there is a >>>> couple of >>>> > minutes >>>> > of complete silence where even terminal commands like an >>>> "ls" just >>>> > waits >>>> > until they are executed and then it starts sending full >>>> speed again. I >>>> > ´ve linked to a screenshot showing this send and pause >>>> behaviour. The >>>> > blue line is the total, green is SMB and turquoise is NFS >>>> over jumbo >>>> > frames. It behaves this way regardless of the protocol. >>>> > >>>> > http://oi62.tinypic.com/33xvjb6.jpg >>>> > >>>> > The problem is that these pauses can sometimes be so long >>>> that >>>> > connections drop. Like someone is copying files over SMB or >>>> iSCSI and >>>> > suddenly they get an error message saying that the transfer >>>> failed and >>>> > they have to start over with the file(s). That´s horrible! >>>> > >>>> > So far NFS has proven to be the most resillient, it´s stupid >>>> simple >>>> > nature just waits and resumes transfer when pause is over. >>>> Kudus for >>>> > that. >>>> > >>>> > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 >>>> and 64GB >>>> > ECC >>>> > RAM. The hardware has been ruled out, we happened to have a >>>> identical >>>> > MB >>>> > and CPU lying around and that didn´t improve things. We have >>>> also >>>> > installed a Intel PRO 100/1000 Quad-port ethernet adapter to >>>> test if >>>> > that would change things, but it hasn´t, it still behaves >>>> this way. >>>> > >>>> > The two built-in NIC's are Intel 82574L and the Quad-port >>>> NIC's are >>>> > Intel 82571EB, so both em(4) driven. I happen to know that >>>> the em >>>> > driver >>>> > has updated between 9.3 and 10.1. Perhaps that is to blame, >>>> but I have >>>> > no idea. >>>> > >>>> > Is there anyone that can make sense of this? >>>> > >>>> > [1]: >>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 >>>> > >>>> > [2]: >>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 >>>> > >>>> > /K >>>> > >>>> > >>>> >>>> >>>> Another observation I´ve made is that during these pauses, the >>>> entire >>>> system is put on hold, even ZFS scrub stops and then resumes >>>> after a >>>> while. Looking in top, the system is completly idle. >>>> >>>> Normally during scrub, the kernel eats 20-30% CPU, but during >>>> a pause, >>>> even the [kernel] goes down to 0.00%. Makes me think the >>>> networking has >>>> nothing to do with it. >>>> >>>> What´s then to blame? ZFS? >>>> >>>> /K >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org" >>>> >>>> >>>> Hello, >>>> >>>> >>>> does this happen when clients are only reading from server? >>> >>> Yes it happens when clients are only reading from the server. >>> >>>> Otherwise I would suspect that it could be caused by ZFS writing out a >>>> large chunck of data sitting in its caches, and until that is complete >>>> I/O is stalled. >>> >>> That´s what so strange, we have three more systems set up about the same >>> size and none of others are acting this way. >>> >>> The only thing I can think of that differs that we haven´t tested ruling >>> out yet is ctld, the other systems are still running istgt as their >>> iSCSI daemon. >> >> So, were you able to rule out ctld? >> >> Do you have local, or terminal, access to the machine? When the problem >> manifests, do local commands work? In other words, is the whole machine >> wedged, or just the network? If it's just the network, it might be >> caused by ctld consuming all available mbufs. You could run "netstat -m" >> before and after to check that. >> > > You already checked (doublechecked) HBA Firmware etc? Cabling is fine? > > I expect you already disabled tso, gro, rxcsum, txcsum on your NIC(s). I > had similar effects, with all those fancy uberfeatures enabled. > > Give it a try... ifconfig foo0 -rxcsum -txcsum -tso -gro > > Capturing a few MB of Traffic before/after could be also very helpful to > see if... > errm, sorry. Forgot something... how does your Network Setup look like? Link Aggregations? Which Switches, which Linespeed, Stacked or not? Any Drops / Errors on your Interfaces? > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?557D80F8.9000505>