Date: Tue, 2 Jun 2015 08:33:00 +0000 From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= <karli.sjoberg@slu.se> To: Andreas Nilsson <andrnils@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: [Fwd: Strange networking behaviour in storage server] Message-ID: <1433233998.27174.4.camel@data-b104.adm.slu.se> In-Reply-To: <CAPS9%2BSvWKi_1bu3Ypxb8CwTehQcjJOo%2BVRg7QpDoreebGrPEPg@mail.gmail.com> References: <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se> <CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g@mail.gmail.com> <1433154506.14998.192.camel@data-b104.adm.slu.se> <CAPS9%2BSuAVEjF3Z5%2Bx7Z6i1i_%2BSHTJTv7dpG2WJ8JeZ2e0=a-%2Bg@mail.gmail.com> <1433156515.14998.194.camel@data-b104.adm.slu.se> <CAPS9%2BSvWKi_1bu3Ypxb8CwTehQcjJOo%2BVRg7QpDoreebGrPEPg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Vet inte varför du skriver engelska när det bara är mellan oss... Kanske glömde svara alla? tis 2015-06-02 klockan 10:10 +0200 skrev Andreas Nilsson: > No, mbufs should not effect a scrub. > > > You can get some stats from vmstat -z Hmm, what would i be looking for exactly? > > > Have had a systat running while IO stalls? Same as above. > > > Also, zpool has tunable for failmode, which defaults to wait, but as > you say scrub/zpool status indicates no errors this is unlikely the > cause. > > > > Other than that I'm out of ideas :( We have that in common:) /K > > > Best regards > > Andreas > > > On Mon, Jun 1, 2015 at 1:01 PM, Karli Sjöberg <karli.sjoberg@slu.se> > wrote: > mån 2015-06-01 klockan 12:53 +0200 skrev Andreas Nilsson: > > Interesting. > > > > > > Out of mbufs perhaps? > > Hmm, why would depleted mbufs stall even a scrub? > > How would I verify that? > > /K > > > > > > > /A > > > > > > On Mon, Jun 1, 2015 at 12:28 PM, Karli Sjöberg > <karli.sjoberg@slu.se> > > wrote: > > mån 2015-06-01 klockan 02:56 -0700 skrev Mehmet Erol > > Sanliturk: > > > > > > > > > On Mon, Jun 1, 2015 at 2:02 AM, Karli Sjöberg > > <karli.sjoberg@slu.se> > > > wrote: > > > mån 2015-06-01 klockan 10:33 +0200 skrev > Andreas > > Nilsson: > > > > > > > > > > > > On Mon, Jun 1, 2015 at 10:14 AM, Karli > Sjöberg > > > <karli.sjoberg@slu.se> > > > > wrote: > > > > -------- Vidarebefordrat > meddelande > > -------- > > > > > Från: Karli Sjöberg > > <karli.sjoberg@slu.se> > > > > > Till: freebsd-fs@freebsd.org > > > <freebsd-fs@freebsd.org> > > > > > Ämne: Strange networking > behaviour in > > storage > > > server > > > > > Datum: Mon, 1 Jun 2015 > 07:49:56 +0000 > > > > > > > > > > Hey! > > > > > > > > > > So we have this ZFS storage > server > > upgraded from > > > 9.3-RELEASE > > > > to > > > > > 10.1-STABLE to overcome not > being able > > to 1) use > > > SSD drives > > > > as > > > > > L2ARC[1] > > > > > and 2) not being able to > hotswap SATA > > drives[2]. > > > > > > > > > > After the upgrade we´ve > noticed a very > > odd > > > networking > > > > behaviour, it > > > > > sends/receives full speed for > a while, > > then there > > > is a > > > > couple of > > > > > minutes > > > > > of complete silence where even > terminal > > commands > > > like an > > > > "ls" just > > > > > waits > > > > > until they are executed and > then it > > starts sending > > > full > > > > speed again. I > > > > > ´ve linked to a screenshot > showing this > > send and > > > pause > > > > behaviour. The > > > > > blue line is the total, green > is SMB and > > turquoise > > > is NFS > > > > over jumbo > > > > > frames. It behaves this way > regardless > > of the > > > protocol. > > > > > > > > > > > http://oi62.tinypic.com/33xvjb6.jpg > > > > > > > > > > The problem is that these > pauses can > > sometimes be > > > so long > > > > that > > > > > connections drop. Like someone > is > > copying files > > > over SMB or > > > > iSCSI and > > > > > suddenly they get an error > message > > saying that the > > > transfer > > > > failed and > > > > > they have to start over with > the > > file(s). That´s > > > horrible! > > > > > > > > > > So far NFS has proven to be > the most > > resillient, > > > it´s stupid > > > > simple > > > > > nature just waits and resumes > transfer > > when pause > > > is over. > > > > Kudus for > > > > > that. > > > > > > > > > > The server is driven by a > Supermicro > > X9SRL-F, a > > > Xeon 1620v2 > > > > and 64GB > > > > > ECC > > > > > RAM. The hardware has been > ruled out, we > > happened > > > to have a > > > > identical > > > > > MB > > > > > and CPU lying around and that > didn´t > > improve > > > things. We have > > > > also > > > > > installed a Intel PRO 100/1000 > Quad-port > > ethernet > > > adapter to > > > > test if > > > > > that would change things, but > it hasn´t, > > it still > > > behaves > > > > this way. > > > > > > > > > > The two built-in NIC's are > Intel 82574L > > and the > > > Quad-port > > > > NIC's are > > > > > Intel 82571EB, so both em(4) > driven. I > > happen to > > > know that > > > > the em > > > > > driver > > > > > has updated between 9.3 and > 10.1. > > Perhaps that is > > > to blame, > > > > but I have > > > > > no idea. > > > > > > > > > > Is there anyone that can make > sense of > > this? > > > > > > > > > > [1]: > > > > > > > > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164 > > > > > > > > > > [2]: > > > > > > > > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 > > > > > > > > > > /K > > > > > > > > > > > > > > > > > > > > > > Another observation I´ve made is > that > > during these > > > pauses, the > > > > entire > > > > system is put on hold, even ZFS > scrub > > stops and then > > > resumes > > > > after a > > > > while. Looking in top, the > system is > > completly idle. > > > > > > > > Normally during scrub, the > kernel eats > > 20-30% CPU, > > > but during > > > > a pause, > > > > even the [kernel] goes down to > 0.00%. > > Makes me think > > > the > > > > networking has > > > > nothing to do with it. > > > > > > > > What´s then to blame? ZFS? > > > > > > > > /K > > > > > > _______________________________________________ > > > > freebsd-fs@freebsd.org mailing > list > > > > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > To unsubscribe, send any mail to > > > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > > > > > > > Hello, > > > > > > > > > > > > does this happen when clients are only > reading > > from server? > > > > > > Yes it happens when clients are only > reading from > > the server. > > > > > > > Otherwise I would suspect that it could > be caused > > by ZFS > > > writing out a > > > > large chunck of data sitting in its > caches, and > > until that > > > is complete > > > > I/O is stalled. > > > > > > That´s what so strange, we have three more > systems > > set up > > > about the same > > > size and none of others are acting this > way. > > > > > > The only thing I can think of that differs > that we > > haven´t > > > tested ruling > > > out yet is ctld, the other systems are > still running > > istgt as > > > their > > > iSCSI daemon. > > > > > > /K > > > > > > > > > > > > > > > If there are other three similar systems and they > are > > exactly > > > installed with the same structure , my first > possibility to > > consider > > > would be to suspect a slowly progressing hardware > failure : > > > > > > > > > From a circuit , it is not possible to get a > response in > > expected > > > time , but , it is responding after a time which > is not > > normal . Such > > > an action may be caused by a faulty soldered or > cracked line > > point in > > > the circuit : When it is hot , it is > disconnecting , when it > > is cold > > > it is connecting . > > > > > > As initially stated, both motherboard and processor > has been > > replaced > > with identical hardware that went through a day of > memtest > > before being > > installed. Then there´s an external Supermicro > JBOD[*] but I > > haven´t > > seen any disk timeouts or SES errors logged. At > least at a > > driver level > > there should have been timeouts at such a long delay > as five > > minutes. > > > > /K > > > > [*]: > > > http://www.supermicro.nl/products/chassis/3U/837/SC837E26-RJBOD1.cfm > > > > > > > > > > > > > > > > > > > > Thank you very much . > > > > > > > > > > > > Mehmet Erol Sanliturk > > > > > > > > > > > > > > > > > > > > > > > > > > > Have you tried what is suggested in > > > > > https://wiki.freebsd.org/ZFSTuningGuide ? In > > particular > > > setting > > > > vfs.zfs.write_limit_override to > something > > appropriate for > > > your site. > > > > The timeout seems to be defaulting to 5 > now. > > > > > > > > > > > > Best regards > > > > > > > > Andreas > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > > > > > > > > > > > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1433233998.27174.4.camel>
