Date: Wed, 19 Aug 2009 05:52:23 -0700 (PDT) From: alexpalias-bsdnet@yahoo.com To: =?utf-8?B?0JTQvNC40YLRgNC40Lkg0JfQsNC80YPRgNCw0LXQsg==?= <gigabyte.tmn@gmail.com> Cc: freebsd-net@freebsd.org Subject: RE: em driver input errors Message-ID: <24727.68667.qm@web56404.mail.re3.yahoo.com> In-Reply-To: <001401ca1f4d$e96a2170$1e010a0a@in72.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
Greetings.=0A=0A--- On Mon, 8/17/09, =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8= =D0=B9 =D0=97=D0=B0=D0=BC=D1=83=D1=80=D0=B0=D0=B5=D0=B2 <gigabyte.tmn@gmail= .com> wrote:=0A=0A> From: =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=97= =D0=B0=D0=BC=D1=83=D1=80=D0=B0=D0=B5=D0=B2 <gigabyte.tmn@gmail.com>=0A> Sub= ject: RE: em driver input errors=0A> To: alexpalias-bsdnet@yahoo.com=0A> Cc= : freebsd-net@freebsd.org=0A> Date: Monday, August 17, 2009, 6:17 PM=0A> = =0A> =0A> >/boot/loader.conf:=0A> >hw.em.rxd=3D4096=0A> >hw.em.txd=3D4096= =0A> why you are using this=0A> values? try default (without =0A> this line= s in loader.conf)=0A=0AAs said in my original email, I was getting way more= errors with the defaults.=0A =C2=A0=0A> > Witout the above we=0A> were see= ing way more =0A> errors, now they are reduced, but still come in bursts of= =0A> over 1000 errors on =0A> em0.=0A> >Still seeing errros,=0A> after some= searching the =0A> mailing lists we also added:=0A> ># the four lines belo= w=0A> are repeated for em1, =0A> em2, =0A> em3=0A> >dev.em.0.rx_int_delay= =3D0=0A> >dev.em.0.rx_abs_int_delay=3D0=0A> >dev.em.0.tx_int_delay=3D0=0A> = >dev.em.0.tx_abs_int_delay=3D0=0A> try to increase=0A> rx_int_delay to 600 = and =0A> rx_abs_int_delay to 1000, tx_*_delay without changes ->=0A> by def= ault =0A> (100?)=0A=0AThanks for the suggestion.=0AFrom a "clean" box:=0Ade= v.em.0.rx_int_delay: 0=0Adev.em.0.tx_int_delay: 66=0Adev.em.0.rx_abs_int_de= lay: 66=0Adev.em.0.tx_abs_int_delay: 66=0A=0AI reset all the values (errors= still appearing), then tried your suggestion (rx_int_delay=3D600, rx_abs_i= nt_delay=3D1000). This has reduced the number of interrupts for em0 (from = about 7200/sec to around 6500/sec). After some time, I started getting err= ors again. But that has made me try this also:=0A=0Adev.em.0.tx_int_delay= =3D600=0Adev.em.0.tx_abs_int_delay=3D1000=0A=0AMeaning using your suggested= values for tx too. Now em0 is seeing about 1800 interrupts/second, which = is way better, but after some time I saw errors again...=0A=0AFrom the outp= ut of "netstat -nI em0 -w 5":=0A=0A input (em0) = output=0A packets errs bytes packets errs bytes colls= =0A 87267 0 50372599 106931 0 81598993 0=0A 864= 96 0 50990332 105467 0 80064657 0=0A 81726 3056 = 49876613 99080 0 73273640 0=0A 90425 0 59172531 = 105299 0 77110096 0=0A 120292 0 70369292 109597 = 0 78626248 0=0A... a few minutes pass with zero errors ...=0A 8= 9646 0 56951878 111240 0 86493393 0=0A 86031 0 = 53549721 108695 0 83592747 0=0A 77760 3054 48505562 = 96912 0 73185576 0=0A 87508 0 56116394 106094 = 0 79130608 0=0A 89031 0 56490982 103039 0 773= 98567 0=0A=0AWhat's interesting is that I'm seeing errors in a 80k pack= ets/5 sec (so around 16k packets/s) zone, but no errors at 120k packets/5se= c (24kpps).=0A=0A=0ACurrently, I've set the delay to 600 and abs_delay to 1= 000 on all interfaces (em0, em1, em2, em3), thus reducing the number of int= errupts.=0AI'm currently seeing (in systat -vmstat 2):=0AAround 1800 irqs/s= for em0, 1800 for em1, 1800 for em2, under 10/s for em3=0AAround 2000 irqs= /s for cpu0:time, 2000 more for cpu1:time, 2000 for cpu2:time and 2000 for = cpu3:time.=0A=0AInterrupts total (as reported by systat): around 13500/sec= ond. I would estimate the old IRQ load at around 30000-35000/second, which= doesn't seem too much to me, for a dual xeon machine.=0A=C2=A0=0A> >kern.i= pc.nmbclusters=3D655360=0A> no need. see netstat=0A> -m=0A=0AThanks, but as= I said, I did try almost *EVERYTHING* I could without rebooting. Includin= g this.=0A=0ASpeaking of which, I did compile the kernel with "options DEVI= CE_POLLING", but enabling polling only made the errors appear more often, a= nd in greater numbers.=0A=0A> P.S. change copper cable,=0A> turn off the fl= ow-control =0A> (if is on) =0A=0AThere are 4 em interfaces on this machine,= with new cat6 cables. 2 more em interfaces on another machine that was se= eing the same errors (the old router), on different cables. And 2 more em = interfaces on another machine that's in production, also with new cables. = The input errors (as debugged by sysctl dev.em.0.stats=3D1 -> read dmesg) a= re only 2 because of CRC errors, as opposed to around 2.500.000 from other = causes. I tend to feel the cable isn't the problem.=0A=0AFlow control is o= ff, I just checked. I forgot about that one, thanks for reminding me.=0A= =0A=0AThank you for your help=0AAlex
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?24727.68667.qm>