From owner-freebsd-net@FreeBSD.ORG Mon Sep 3 02:41:03 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7143B106564A; Mon, 3 Sep 2012 02:41:03 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 34CB28FC08; Mon, 3 Sep 2012 02:41:03 +0000 (UTC) Received: by dadr6 with SMTP id r6so3166767dad.13 for ; Sun, 02 Sep 2012 19:40:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=xjGKoprUldiNBfc4KLLyp4JBZvrTVx972rJa5vsWCEs=; b=zVCe7MmwWW4KjQ31IySvMYZ9EhXOMbiKS6P+7bWiiiLJ9gPA9gsHCopLQhOVb7BAdM E2vhRNIMBtX2l6Yp+HMrB3akj3M+FmDSjDyWgzlXgvDZTio9xjnoAfyPzPkE3HQhp+n4 84UngI7SQQTfuJBNhjDkzwpNQpT1oR6eiKeFo5NH8MQMexBWKiqTuXoYU2ZUe2w2egQu /hJKvRXYoKHUdaQ+q7fcIta28GK4QnWlXpo0pWmy5/HOvG/3fTRxcdWTt9GOV/n52vOa 4evJkoIIu8nicKHIyU6zZWd91bKN0F3tB+BQhCbi4H0A9jHWjGMKBFwycz0062iJl2uI CTwA== Received: by 10.68.241.226 with SMTP id wl2mr34151027pbc.62.1346640057039; Sun, 02 Sep 2012 19:40:57 -0700 (PDT) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPS id kt1sm8838681pbc.45.2012.09.02.19.40.53 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 02 Sep 2012 19:40:56 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 03 Sep 2012 11:40:49 -0700 From: YongHyeon PYUN Date: Mon, 3 Sep 2012 11:40:49 -0700 To: Eugene Grosbein Message-ID: <20120903184049.GB3730@michelle.cdnetworks.com> References: <1865271844.20120829131610@serebryakov.spb.ru> <1807373989.20120829223125@serebryakov.spb.ru> <20120830152726.A33776@sola.nimnet.asn.au> <534292400.20120830131158@serebryakov.spb.ru> <20120831180721.GB3208@michelle.cdnetworks.com> <50404F91.8080302@rdtc.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50404F91.8080302@rdtc.ru> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, Lev Serebryakov , Ian Smith Subject: Re: vr(4) troubles for AMD Geode CS5536 chipset X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Sep 2012 02:41:03 -0000 On Fri, Aug 31, 2012 at 12:45:53PM +0700, Eugene Grosbein wrote: > In previous letter I've described my attempts to try vr(4) from HEAD. > Now I'd like to explain why I've tried it. > > The problem is that stock vr(4) from 8.3-STABLE/i386 has serious issues for my system. > I have home router with two vr interfaces, vr0 is for LAN (IPoE) and vr1 is for WAN (PPPoE/mpd). > > Presently, every day my WAN vr interface stops running correctly: > sometimes it stops receiving all packets - tcpdump shows none of them. > Sometimes, it receives some but with great delay - up to 10 seconds (not miliseconds) > and even more. tcpdump shows that delay occurs on receive path. > Sometimes, it even rearranges packets - tcpdump shows that some incoming ICMP echo requests > with lower sequence numbers come in later that already answered higher-numbered requests. Hmm, it seems driver's consumer/producer index of RX path were corrupted. > > ifconfig vr1 down/up revives interface completely until next morning. > sysctl net.inet.ip.fw.enable=0 does not solve the problem. > > I have control over WAN switching/routing network and may assure it runs just fine. > However, I can't guarantee it has no "soft" anomalies like short storms or some silly broadcasts. > > I've tried to make incoming flood with ng_source(4) generated UDP flood at 100M rate > for 60 seconds and failed to reproduce the problem artificially. > > I've tried to move WAN from vr1 to vr0 and the problem has moved to vr0 too. > My LAN has very little traffic and corresponding vr interface exhibits no problems. > > This router also routinely runs transmission (torrent client from ports) > serving torrents from USB-attached HDD making severe CPU load, so I suspect > the problem may be related with CPU load. > > I've also checked mbuf/mbuf clusters usage and they are all right: > > # netstat -m > 1539/2076/3615 mbufs in use (current/cache/total) > 1200/1278/2478/65536 mbuf clusters in use (current/cache/total/max) > 1200/306 mbuf+clusters out of packet secondary zone in use (current/cache) > 318/181/499/12800 4k (page size) jumbo clusters in use (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 4056K/3799K/7855K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/4/6656 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > # vmstat -z | egrep -i 'ITEM|mbuf' > ITEM SIZE LIMIT USED FREE REQUESTS FAILURES > mbuf_packet: 256, 0, 1429, 77, 112854470, 0 > mbuf: 256, 0, 489, 1620, 369073316, 0 > mbuf_cluster: 2048, 65536, 1506, 604, 5401864, 0 > mbuf_jumbo_page: 4096, 12800, 469, 158, 8306777, 0 > mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0 > mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0 > mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0 > NetGraph items: 36, 4130, 1, 117, 263123, 0 > NetGraph data items: 36, 531, 0, 295, 106663377, 0 > > While ifconfig vr1 down/up solves the problem completely (for some long time), > taking link down/up using switch solves it "in half" - huge packet delays disappear > and turn to 25% packet loss happening in regular short intervals, once a second of like. > > ifconfig down/up clears this mess too. > > Please help me to debug this, it's pretty annoying. By chance, did vr(4) spew some kind of diagnostics messages to console? If I remember correctly, vr(4) automatically restarts controller and show these errors when it detects abnormal condition. Abnormal conditions for vr(4) would be: - TX/RX MAC stuck - RX MAC stop due to FIFO overflow or no RX buffers - PCI bus errors - TX abort - TX underrun > I had a hope new vr(4) driver would help but it takes my system down under average load > and is unusable. > > Here is start of dmesg.boot: > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.3-STABLE #1: Wed Aug 29 22:49:45 NOVT 2012 > root@grosbein.pp.ru:/usr/local/obj/nanobsd.gw/i386/usr/local/src/sys/GW i386 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Geode(TM) Integrated Processor by AMD PCS (499.91-MHz 586-class CPU) > Origin = "AuthenticAMD" Id = 0x5a2 Family = 5 Model = a Stepping = 2 > Features=0x88a93d > AMD Features=0xc0400000 > real memory = 1065025536 (1015 MB) > avail memory = 1032929280 (985 MB) > K6-family MTRR support enabled (2 registers) > > I must also note that this system runs with ACPI disabled in /boot/loader.conf: > hint.acpi.0.disabled=1 > > Otherwise, its timekeeping becomes broken. > > Eugene Gtosbein