Date: Mon, 1 Jun 2015 10:18:20 +0000 From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= <karli.sjoberg@slu.se> To: Andreas Nilsson <andrnils@gmail.com> Cc: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <1433153917.14998.185.camel@data-b104.adm.slu.se> In-Reply-To: <CAPS9%2BSvW_=O3m%2BsbCugZhY8ibo-FwYV5w49=ubw0_FUT5Q%2Bo=g@mail.gmail.com> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se> <CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g@mail.gmail.com> <CAPS9%2BSvW_=O3m%2BsbCugZhY8ibo-FwYV5w49=ubw0_FUT5Q%2Bo=g@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
mån 2015-06-01 klockan 12:11 +0200 skrev Andreas Nilsson: > > > On Mon, Jun 1, 2015 at 11:56 AM, Mehmet Erol Sanliturk > <m.e.sanliturk@gmail.com> wrote: > > > On Mon, Jun 1, 2015 at 2:02 AM, Karli Sjöberg > <karli.sjoberg@slu.se> wrote: > mån 2015-06-01 klockan 10:33 +0200 skrev Andreas > Nilsson: > > > > > > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sjöberg > <karli.sjoberg@slu.se> > > wrote: > > -------- Vidarebefordrat meddelande -------- > > > Från: Karli Sjöberg <karli.sjoberg@slu.se> > > > Till: freebsd-fs@freebsd.org > <freebsd-fs@freebsd.org> > > > Ämne: Strange networking behaviour in > storage server > > > Datum: Mon, 1 Jun 2015 07:49:56 +0000 > > > > > > Hey! > > > > > > So we have this ZFS storage server > upgraded from 9.3-RELEASE > > to > > > 10.1-STABLE to overcome not being able to > 1) use SSD drives > > as > > > L2ARC[1] > > > and 2) not being able to hotswap SATA > drives[2]. > > > > > > After the upgrade we´ve noticed a very odd > networking > > behaviour, it > > > sends/receives full speed for a while, > then there is a > > couple of > > > minutes > > > of complete silence where even terminal > commands like an > > "ls" just > > > waits > > > until they are executed and then it starts > sending full > > speed again. I > > > ´ve linked to a screenshot showing this > send and pause > > behaviour. The > > > blue line is the total, green is SMB and > turquoise is NFS > > over jumbo > > > frames. It behaves this way regardless of > the protocol. > > > > > > http://oi62.tinypic.com/33xvjb6.jpg > > > > > > The problem is that these pauses can > sometimes be so long > > that > > > connections drop. Like someone is copying > files over SMB or > > iSCSI and > > > suddenly they get an error message saying > that the transfer > > failed and > > > they have to start over with the file(s). > That´s horrible! > > > > > > So far NFS has proven to be the most > resillient, it´s stupid > > simple > > > nature just waits and resumes transfer > when pause is over. > > Kudus for > > > that. > > > > > > The server is driven by a Supermicro > X9SRL-F, a Xeon 1620v2 > > and 64GB > > > ECC > > > RAM. The hardware has been ruled out, we > happened to have a > > identical > > > MB > > > and CPU lying around and that didn´t > improve things. We have > > also > > > installed a Intel PRO 100/1000 Quad-port > ethernet adapter to > > test if > > > that would change things, but it hasn´t, > it still behaves > > this way. > > > > > > The two built-in NIC's are Intel 82574L > and the Quad-port > > NIC's are > > > Intel 82571EB, so both em(4) driven. I > happen to know that > > the em > > > driver > > > has updated between 9.3 and 10.1. Perhaps > that is to blame, > > but I have > > > no idea. > > > > > > Is there anyone that can make sense of > this? > > > > > > [1]: > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 > > > > > > [2]: > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 > > > > > > /K > > > > > > > > > > > > Another observation I´ve made is that during > these pauses, the > > entire > > system is put on hold, even ZFS scrub stops > and then resumes > > after a > > while. Looking in top, the system is > completly idle. > > > > Normally during scrub, the kernel eats > 20-30% CPU, but during > > a pause, > > even the [kernel] goes down to 0.00%. Makes > me think the > > networking has > > nothing to do with it. > > > > What´s then to blame? ZFS? > > > > /K > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > Hello, > > > > > > does this happen when clients are only reading from > server? > > Yes it happens when clients are only reading from the > server. > > > Otherwise I would suspect that it could be caused by > ZFS writing out a > > large chunck of data sitting in its caches, and > until that is complete > > I/O is stalled. > > That´s what so strange, we have three more systems set > up about the same > size and none of others are acting this way. > > The only thing I can think of that differs that we > haven´t tested ruling > out yet is ctld, the other systems are still running > istgt as their > iSCSI daemon. > > /K > > What does a zpool status say? Could very well be disks starting to > fail. > > > Anything in dmesg concerning cam timeouts? > > > Best regards > > Andreas > Pool status is fine, scrubbed multiple times without errors. No storage related errors, we´re using LSI HBA's so no cam, but nothing mps-related either. /Khelp
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1433153917.14998.185.camel>
