Date: Mon, 3 Sep 2012 11:40:49 -0700 From: YongHyeon PYUN <pyunyh@gmail.com> To: Eugene Grosbein <egrosbein@rdtc.ru> Cc: freebsd-net@freebsd.org, Lev Serebryakov <lev@freebsd.org>, Ian Smith <smithi@nimnet.asn.au> Subject: Re: vr(4) troubles for AMD Geode CS5536 chipset Message-ID: <20120903184049.GB3730@michelle.cdnetworks.com> In-Reply-To: <50404F91.8080302@rdtc.ru> References: <1865271844.20120829131610@serebryakov.spb.ru> <CAHu1Y70MynCMQTrJUMwTZ0%2BLrM1JiZFt_B77028XHfoiRgzmaA@mail.gmail.com> <1807373989.20120829223125@serebryakov.spb.ru> <20120830152726.A33776@sola.nimnet.asn.au> <534292400.20120830131158@serebryakov.spb.ru> <20120831180721.GB3208@michelle.cdnetworks.com> <50404F91.8080302@rdtc.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 31, 2012 at 12:45:53PM +0700, Eugene Grosbein wrote: > In previous letter I've described my attempts to try vr(4) from HEAD. > Now I'd like to explain why I've tried it. > > The problem is that stock vr(4) from 8.3-STABLE/i386 has serious issues for my system. > I have home router with two vr interfaces, vr0 is for LAN (IPoE) and vr1 is for WAN (PPPoE/mpd). > > Presently, every day my WAN vr interface stops running correctly: > sometimes it stops receiving all packets - tcpdump shows none of them. > Sometimes, it receives some but with great delay - up to 10 seconds (not miliseconds) > and even more. tcpdump shows that delay occurs on receive path. > Sometimes, it even rearranges packets - tcpdump shows that some incoming ICMP echo requests > with lower sequence numbers come in later that already answered higher-numbered requests. Hmm, it seems driver's consumer/producer index of RX path were corrupted. > > ifconfig vr1 down/up revives interface completely until next morning. > sysctl net.inet.ip.fw.enable=0 does not solve the problem. > > I have control over WAN switching/routing network and may assure it runs just fine. > However, I can't guarantee it has no "soft" anomalies like short storms or some silly broadcasts. > > I've tried to make incoming flood with ng_source(4) generated UDP flood at 100M rate > for 60 seconds and failed to reproduce the problem artificially. > > I've tried to move WAN from vr1 to vr0 and the problem has moved to vr0 too. > My LAN has very little traffic and corresponding vr interface exhibits no problems. > > This router also routinely runs transmission (torrent client from ports) > serving torrents from USB-attached HDD making severe CPU load, so I suspect > the problem may be related with CPU load. > > I've also checked mbuf/mbuf clusters usage and they are all right: > > # netstat -m > 1539/2076/3615 mbufs in use (current/cache/total) > 1200/1278/2478/65536 mbuf clusters in use (current/cache/total/max) > 1200/306 mbuf+clusters out of packet secondary zone in use (current/cache) > 318/181/499/12800 4k (page size) jumbo clusters in use (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 4056K/3799K/7855K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/4/6656 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > # vmstat -z | egrep -i 'ITEM|mbuf' > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > mbuf_packet: 256, 0, 1429, 77, 112854470, 0 > mbuf: 256, 0, 489, 1620, 369073316, 0 > mbuf_cluster: 2048, 65536, 1506, 604, 5401864, 0 > mbuf_jumbo_page: 4096, 12800, 469, 158, 8306777, 0 > mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0 > mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0 > mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0 > NetGraph items: 36, 4130, 1, 117, 263123, 0 > NetGraph data items: 36, 531, 0, 295, 106663377, 0 > > While ifconfig vr1 down/up solves the problem completely (for some long time), > taking link down/up using switch solves it "in half" - huge packet delays disappear > and turn to 25% packet loss happening in regular short intervals, once a second of like. > > ifconfig down/up clears this mess too. > > Please help me to debug this, it's pretty annoying. By chance, did vr(4) spew some kind of diagnostics messages to console? If I remember correctly, vr(4) automatically restarts controller and show these errors when it detects abnormal condition. Abnormal conditions for vr(4) would be: - TX/RX MAC stuck - RX MAC stop due to FIFO overflow or no RX buffers - PCI bus errors - TX abort - TX underrun > I had a hope new vr(4) driver would help but it takes my system down under average load > and is unusable. > > Here is start of dmesg.boot: > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.3-STABLE #1: Wed Aug 29 22:49:45 NOVT 2012 > root@grosbein.pp.ru:/usr/local/obj/nanobsd.gw/i386/usr/local/src/sys/GW i386 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Geode(TM) Integrated Processor by AMD PCS (499.91-MHz 586-class CPU) > Origin = "AuthenticAMD" Id = 0x5a2 Family = 5 Model = a Stepping = 2 > Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX> > AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!> > real memory = 1065025536 (1015 MB) > avail memory = 1032929280 (985 MB) > K6-family MTRR support enabled (2 registers) > > I must also note that this system runs with ACPI disabled in /boot/loader.conf: > hint.acpi.0.disabled=1 > > Otherwise, its timekeeping becomes broken. > > Eugene Gtosbein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120903184049.GB3730>