From owner-freebsd-net@FreeBSD.ORG Sat Oct 17 21:54:59 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A5CD1065670 for ; Sat, 17 Oct 2009 21:54:59 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id 930808FC22 for ; Sat, 17 Oct 2009 21:54:58 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c122-106-233-13.belrs3.nsw.optusnet.com.au [122.106.233.13]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n9HLstM5002070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 18 Oct 2009 08:54:56 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id n9HLss5M018373; Sun, 18 Oct 2009 08:54:54 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id n9HLssVO018372; Sun, 18 Oct 2009 08:54:54 +1100 (EST) (envelope-from peter) Date: Sun, 18 Oct 2009 08:54:54 +1100 From: Peter Jeremy To: rihad Message-ID: <20091017215454.GG38569@server.vk2pj.dyndns.org> References: <4AC8A76B.3050502@mail.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PyMzGVE0NRonI6bs" Content-Disposition: inline In-Reply-To: <4AC8A76B.3050502@mail.ru> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-net@freebsd.org Subject: Re: dummynet dropping too many packets X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Oct 2009 21:54:59 -0000 --PyMzGVE0NRonI6bs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2009-Oct-04 18:47:23 +0500, rihad wrote: >Hi, we have around 500-600 mbit/s traffic flowing through a 7.1R Dell=20 >PowerEdge w/ 2 GigE bce cards. There are currently around 4 thousand ISP= =20 >users online limited by dummynet pipes of various speeds. According to=20 >netstat -s output around 500-1000 packets are being dropped every second= =20 >(this accounts for wasting around 7-12 mbit/s worth of traffic according= =20 >to systat -ifstat): This has been a most interesting thread. A couple of comments: Traffic shaping only works cleanly on TCP flows - UDP has no feedback mechanism and so will not automatically throttle to fit into the available bandwidth, potentially leading to high packet drops within dummynet. Is it possible that some of your customers are heavily using UDP? Have you tried allowing just UDP traffic to bypass the pipes to see if this has any effect on drop rate? The pipe lists you posted showed that virtually all the packet drops are associated with one or two IP addresses. If this is really true, rather than a measurement artifact, you might find it useful to tcpdump those addresses and see if there's anything unusual in the data being passed. Also, if you monitor the pipe lists following a cold start, do those addresses appear early and just not show any packet loss until the total number of users builds up or do they not appear until later and immediately show packet loss? Looking at how 'output packets dropped due to no bufs, etc.' is counted (ipstat.ips_odropped), if you run 'netstat -id', do you see a large number of drops on bce1 (consistent with the "output packets dropped" counts) or not? This will help narrow down the codepath being followed by dropped packets. Since the problem only appears to manifest when table(0) exceeds 2000 entries, have you considered splitting (at least temporarily) that table (and possibly table(2)) into two (eg table(0) and table(4))? This would help rule out an (unlikely) problem with table sizes. Doin so would require the application to split the users across both tables (eg round-robin or based on one of the bits in the IP address) and then duplicating the relevant ipfw rules - eg: 01060 pipe tablearg ip from any to table(0) out recv bce0 xmit bce1 01061 pipe tablearg ip from any to table(4) out recv bce0 xmit bce1 01070 allow ip from any to table(0) out recv bce0 xmit bce1 01071 allow ip from any to table(4) out recv bce0 xmit bce1 (And I agree that re-arranging rules to reduce the number of repeated tests should improve ipfw efficiency). The symptoms keep making me think "lock contention" - but I'm not sure how to measure that cheaply (AFAIK, LOCK_PROFILING is comparatively expensive). Finally, are you running i386 or amd64? --=20 Peter Jeremy --PyMzGVE0NRonI6bs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (FreeBSD) iEYEARECAAYFAkraPS4ACgkQ/opHv/APuIdPIQCfRqOAHSoTEimaRPAwpLe59072 OxAAn3NdQEeZPIRzV3SWLwyBZ2+KBnFl =MChw -----END PGP SIGNATURE----- --PyMzGVE0NRonI6bs--