Date: Mon, 1 Jun 2015 12:11:15 +0200 From: Andreas Nilsson <andrnils@gmail.com> To: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> Cc: =?UTF-8?Q?Karli_Sj=C3=B6berg?= <karli.sjoberg@slu.se>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <CAPS9%2BSvW_=O3m%2BsbCugZhY8ibo-FwYV5w49=ubw0_FUT5Q%2Bo=g@mail.gmail.com> In-Reply-To: <CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g@mail.gmail.com> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se> <CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 1, 2015 at 11:56 AM, Mehmet Erol Sanliturk < m.e.sanliturk@gmail.com> wrote: > > > On Mon, Jun 1, 2015 at 2:02 AM, Karli Sj=C3=B6berg <karli.sjoberg@slu.se> > wrote: > >> m=C3=A5n 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson: >> > >> > >> > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sj=C3=B6berg <karli.sjoberg@slu= .se> >> > wrote: >> > -------- Vidarebefordrat meddelande -------- >> > > Fr=C3=A5n: Karli Sj=C3=B6berg <karli.sjoberg@slu.se> >> > > Till: freebsd-fs@freebsd.org <freebsd-fs@freebsd.org> >> > > =C3=84mne: Strange networking behaviour in storage server >> > > Datum: Mon, 1 Jun 2015 07:49:56 +0000 >> > > >> > > Hey! >> > > >> > > So we have this ZFS storage server upgraded from 9.3-RELEASE >> > to >> > > 10.1-STABLE to overcome not being able to 1) use SSD drives >> > as >> > > L2ARC[1] >> > > and 2) not being able to hotswap SATA drives[2]. >> > > >> > > After the upgrade we=C2=B4ve noticed a very odd networking >> > behaviour, it >> > > sends/receives full speed for a while, then there is a >> > couple of >> > > minutes >> > > of complete silence where even terminal commands like an >> > "ls" just >> > > waits >> > > until they are executed and then it starts sending full >> > speed again. I >> > > =C2=B4ve linked to a screenshot showing this send and pause >> > behaviour. The >> > > blue line is the total, green is SMB and turquoise is NFS >> > over jumbo >> > > frames. It behaves this way regardless of the protocol. >> > > >> > > http://oi62.tinypic.com/33xvjb6.jpg >> > > >> > > The problem is that these pauses can sometimes be so long >> > that >> > > connections drop. Like someone is copying files over SMB or >> > iSCSI and >> > > suddenly they get an error message saying that the transfer >> > failed and >> > > they have to start over with the file(s). That=C2=B4s horrib= le! >> > > >> > > So far NFS has proven to be the most resillient, it=C2=B4s s= tupid >> > simple >> > > nature just waits and resumes transfer when pause is over. >> > Kudus for >> > > that. >> > > >> > > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 >> > and 64GB >> > > ECC >> > > RAM. The hardware has been ruled out, we happened to have a >> > identical >> > > MB >> > > and CPU lying around and that didn=C2=B4t improve things. We= have >> > also >> > > installed a Intel PRO 100/1000 Quad-port ethernet adapter to >> > test if >> > > that would change things, but it hasn=C2=B4t, it still behav= es >> > this way. >> > > >> > > The two built-in NIC's are Intel 82574L and the Quad-port >> > NIC's are >> > > Intel 82571EB, so both em(4) driven. I happen to know that >> > the em >> > > driver >> > > has updated between 9.3 and 10.1. Perhaps that is to blame, >> > but I have >> > > no idea. >> > > >> > > Is there anyone that can make sense of this? >> > > >> > > [1]: >> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D197164 >> > > >> > > [2]: >> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348 >> > > >> > > /K >> > > >> > > >> > >> > >> > Another observation I=C2=B4ve made is that during these pauses= , the >> > entire >> > system is put on hold, even ZFS scrub stops and then resumes >> > after a >> > while. Looking in top, the system is completly idle. >> > >> > Normally during scrub, the kernel eats 20-30% CPU, but during >> > a pause, >> > even the [kernel] goes down to 0.00%. Makes me think the >> > networking has >> > nothing to do with it. >> > >> > What=C2=B4s then to blame? ZFS? >> > >> > /K >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to >> > "freebsd-fs-unsubscribe@freebsd.org" >> > >> > >> > Hello, >> > >> > >> > does this happen when clients are only reading from server? >> >> Yes it happens when clients are only reading from the server. >> >> > Otherwise I would suspect that it could be caused by ZFS writing out a >> > large chunck of data sitting in its caches, and until that is complete >> > I/O is stalled. >> >> That=C2=B4s what so strange, we have three more systems set up about the= same >> size and none of others are acting this way. >> >> The only thing I can think of that differs that we haven=C2=B4t tested r= uling >> out yet is ctld, the other systems are still running istgt as their >> iSCSI daemon. >> >> /K >> >> What does a zpool status say? Could very well be disks starting to fail. Anything in dmesg concerning cam timeouts? Best regards Andreas
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPS9%2BSvW_=O3m%2BsbCugZhY8ibo-FwYV5w49=ubw0_FUT5Q%2Bo=g>