From owner-freebsd-net@FreeBSD.ORG Sun Apr 15 01:14:12 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7406A1065672 for ; Sun, 15 Apr 2012 01:14:12 +0000 (UTC) (envelope-from leschnik@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 29C218FC15 for ; Sun, 15 Apr 2012 01:14:12 +0000 (UTC) Received: by yenl9 with SMTP id l9so2551096yen.13 for ; Sat, 14 Apr 2012 18:14:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ZLKwZS9s7JMVk5eyYKM7ojwVdaxOENV8VEVy4qB1Ru8=; b=KvAXHNtsP9jHJ4bi+X405KeoO40mueF2K+OmuNtaBHykl/J38Nyy/eVJ6pDUXL5P1s V4lqzTV17/SoGrKqGidyAHnzVqBt5qi8XTbHNdAk/6TWX+UVf7L+IlauM6BTNJwJtV1H bLliUYW0Sn13amk4s2TC14mN1iUKnI2YMEQlh7ew6c/W+VU1nLj2EGrzJHzKgTvuQj4c tSb0pmbQA2vPJJpOa0cAOYekc/E5fvsSdvodcTsT4GsF3J2qGuXTNceeB0LsaRtZTESa ZX89O6o+0rjuqm3XzNxSGnaxe/VA3iQwVvaUZHsgg7+xfnQkfjZo9hEvDkhddxUtHdXr rVpg== MIME-Version: 1.0 Received: by 10.236.138.227 with SMTP id a63mr6631234yhj.80.1334452451547; Sat, 14 Apr 2012 18:14:11 -0700 (PDT) Received: by 10.100.210.4 with HTTP; Sat, 14 Apr 2012 18:14:11 -0700 (PDT) In-Reply-To: <4F89D1F2.3080009@rawbw.com> References: <4F89C005.2020304@rawbw.com> <4F89CE37.1070706@rawbw.com> <4F89CFC1.80306@rawbw.com> <4F89D1F2.3080009@rawbw.com> Date: Sun, 15 Apr 2012 11:14:11 +1000 Message-ID: From: Jason Leschnik To: Yuri Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org Subject: Re: Why host transmit rate on 1Gb ethernet is only ~750Mbps ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Apr 2012 01:14:12 -0000 What about the results on the end node, during the testing?. On Sun, Apr 15, 2012 at 5:37 AM, Yuri wrote: > On 04/14/2012 12:32, Jason Leschnik wrote: >> >> What kind of load are you looking at when running the test? >> >> maybe output `vmstat 1 15` during a test run > > > Here is the log during the test run on the sending side: > > # vmstat 1 15 > =A0procs =A0 =A0 =A0memory =A0 =A0 =A0page =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0disks =A0 =A0 faults =A0 =A0 =A0 =A0 cpu > =A0r b w =A0 =A0 avm =A0 =A0fre =A0 flt =A0re =A0pi =A0po =A0 =A0fr =A0sr= md0 ad0 =A0 in =A0 sy =A0 cs us > sy id > =A00 9 17 =A032481M =A01312M =A03155 =A0 1 =A0 6 =A0 0 =A03263 430 =A0 0 = =A0 0 1163 11537 6958 > =A04 =A01 95 > =A01 9 17 =A032507M =A01312M =A0 283 =A0 0 =A0 0 =A0 0 =A0 =A055 =A0 0 = =A0 0 =A0 0 2354 163960 8891 > =A05 =A04 91 > =A01 9 17 =A032507M =A01312M =A0 =A0 4 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5427 156936 15139 > =A05 =A03 92 > =A00 9 17 =A032507M =A01312M =A03436 =A0 0 =A0 0 =A0 0 =A02380 =A0 0 =A0 = 0 =A0 0 5412 211755 16386 > =A07 =A04 89 > =A00 9 17 =A032507M =A01312M =A0 =A0 4 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5418 151530 15159 > =A05 =A03 92 > =A01 9 17 =A032495M =A01321M =A0 =A0 5 =A0 0 =A0 0 =A0 0 =A02380 =A0 0 = =A0 0 =A0 0 5411 145787 15047 > =A04 =A03 93 > =A01 9 17 =A032495M =A01321M =A0 =A0 4 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5492 147687 15451 > =A04 =A03 93 > =A00 9 17 =A032495M =A01321M =A0 =A0 4 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5441 145666 15155 > =A04 =A03 92 > =A00 9 17 =A032495M =A01321M =A0 600 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5418 147173 15361 > =A04 =A02 93 > =A00 9 17 =A032495M =A01321M =A0 =A0 7 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5428 145410 15122 > =A04 =A03 93 > =A00 9 17 =A032495M =A01321M =A0 =A0 4 =A0 0 =A0 0 =A0 0 =A0 =A0 0 =A0 0 = =A0 0 =A0 0 5427 146896 15175 > =A04 =A03 93 > =A01 9 17 =A032481M =A01312M =A02402 =A0 0 =A0 0 =A0 0 =A0 399 =A0 0 =A0 = 0 =A051 2629 128751 9712 > =A04 =A02 93 > > Yuri --=20 Regards, Jason Leschnik. [m] 0432 35 4224 [w@] jason dot leschnik ansto dot gov dot au [U@]=A0jml974@uow.edu.au From owner-freebsd-net@FreeBSD.ORG Sun Apr 15 02:11:58 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EE713106564A for ; Sun, 15 Apr 2012 02:11:58 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 770988FC08 for ; Sun, 15 Apr 2012 02:11:58 +0000 (UTC) Received: by wibhq7 with SMTP id hq7so3450015wib.13 for ; Sat, 14 Apr 2012 19:11:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=8ZS2ZKq1nc/Awl2rnUsA7zC/+3l6PzjYEo2HSxYrxeo=; b=gLwbaQ0/4fCpwImSYUbWI6ad3QqmMyFUjHMDtpW0lVpky1crb0EZolAL8+Pf0eD9OQ E10M6gV95BBd/UOS78BK283CfUFdCbvm8neO9AqwhnhOTMFXsoWGV0N1rIJq8x7wlyW/ 2l91n/JKBlurqoCE44gNrI+kAEkhmTNaB4B5kyCO9SrfAgsjn7SsVvOezJWhb9ikHLkc +DIW3SPNpU6w/ppIP4OiuqnrMvymSysYoE8+/TX/TK0t73rG4PkADfScJ4bZsAFntLNs yRFynou7AUztlMNGV1NKvsYRowtrH0NUTkX6S2++5xRRf08dZcakRzbCkNvFi/ySafwd A/hg== Received: by 10.180.81.135 with SMTP id a7mr7788903wiy.16.1334455911607; Sat, 14 Apr 2012 19:11:51 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.216.212.157 with HTTP; Sat, 14 Apr 2012 19:11:31 -0700 (PDT) In-Reply-To: References: <4F89C005.2020304@rawbw.com> <4F89CE37.1070706@rawbw.com> <4F89CFC1.80306@rawbw.com> <4F89D1F2.3080009@rawbw.com> From: Juli Mallett Date: Sat, 14 Apr 2012 19:11:31 -0700 X-Google-Sender-Auth: WzUTfA3Oep4yVvQtH0wyfabEYvs Message-ID: To: Jason Leschnik Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmk+jErjxHb9+urrTon8Ghsep3/v9+y+m6eME/fKcdTQtqpkh9s1SbetvbfTF1MszvqChAo Cc: Yuri , freebsd-net@freebsd.org Subject: Re: Why host transmit rate on 1Gb ethernet is only ~750Mbps ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Apr 2012 02:11:59 -0000 On Sat, Apr 14, 2012 at 18:14, Jason Leschnik wrote: > What about the results on the end node, during the testing?. This is not a very quick way to isolate the problem. Some NICs and some configurations just can't do line rate at any packet size, or at any plausible packet size, and if that's the case then looking at load isn't very instructive unless interrupt load is the problem. iperf with UDP is already pretty basic and not very resource-hungry. I've pushed a gigabit with iperf and UDP on pretty trivial hardware, and had problems with high-end hardware and a low-end NIC. 2.5GHz CPU is not enough information. How is the re device attached? What kind of 2.5GHz CPU? If you have netgraph in your kernel and ng_source, you can use it pretty easily. Dump a real packet or dd some random data into a file called "packet-source". Make it the size of the frame size you want to test. Put in a real packet header and then use truncate(1) to adjust its size for a variety of tests. These examples below assume your NIC is called e0 =E2=80=94 replace that wi= th whatever your NIC is really called. In one window, run netstat -w1 -I e0. In another, run this shell script: %%% #! /bin/sh sudo ngctl mkpeer e0: source orphans output sudo nghook e0:orphans input < packet-source sudo ngctl msg e0:orphans start 1000000000 %%% When you want to stop it, just do "sudo ngctl destroy e0:orphans" (I may have that wrong.) Now, watch your packet output rate and bandwidth in netstat. Is it managing 1gbps, or hovering at 750mbps? If the latter, you're not going to get better performance without driver tweaks, and maybe not even then. If you want to put netmap (not netgraph) into your kernel and try its pkt-gen program (in tools/tools/netmap/pkt-gen.c IIRC) you may be able to see if that's better, in which case interrupt overhead may be your issue, in which case you're probably still out-of-luck unless you want to hack the driver or make your application use netmap. Now, on your other host, the one receiving this traffic, put the interface in promiscuous mode. Run netstat -w1 -I {whatever} to watch its receive rate. Is it receiving all of the packets that are getting sent? If the source is managing to send 1gbps and the target is receiving less, there's your problem. If they can both do it, then your bottleneck is somewhere between the Ethernet stack and iperf, but it's likely not CPU time unless you're trying to send very small frames, and even then 2.5GHz can probably swing it. I have a 3GHz (IIRC) NetBurst Xeon that can to 980mbps with infinitesimal frames, and it's rubbish. Builds world in about a light year* and has the disk performance of a muddy lake full of boating enthusiasts. It just doesn't take much CPU time to blithely spew traffic forth, even with iperf. Good luck! *: Not an actual unit of measurement of wall, system, user or real time. From owner-freebsd-net@FreeBSD.ORG Sun Apr 15 23:21:27 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AAC231065675 for ; Sun, 15 Apr 2012 23:21:27 +0000 (UTC) (envelope-from andy@fud.org.nz) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7A3AE8FC17 for ; Sun, 15 Apr 2012 23:21:27 +0000 (UTC) Received: by pbcwz17 with SMTP id wz17so5948512pbc.13 for ; Sun, 15 Apr 2012 16:21:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=nA5+eFeyGOJjUBC+ZZSbRHi4n3r1Z4gTE8j22WRLgm0=; b=dUt1oHUmWD6da00G20xWP+MQD5mhKLOkc3CdAS4XWar9TMbOyHUexvHNFlP/jURXPh r/6PVMRMCpGd1DJKRV5otfqXdqd1HBtH2T5KKJKIOfQBLwhbw6XGRglFIualqlt+E1Wx jhmNsiEKoeEBmUXN1g0z9DdJbU9qiqGtzIHBay2/CAFgg2bST7XSGFlTswyWJJEWPBc4 fDD/X5jo+C3in2Lfsm9vT4PzSjN0rDkg6UWNo1JpE2gFo0xnBmuMyKYh2X5oXs/TTJw8 z46iVcLXCSn0PsaZJga7jlNLRIkLtWdgWOGL6BzNoMz9aPDXusldzfaP/qqFlCaJdovY KhaQ== MIME-Version: 1.0 Received: by 10.68.222.134 with SMTP id qm6mr23633864pbc.14.1334532086854; Sun, 15 Apr 2012 16:21:26 -0700 (PDT) Sender: andy@fud.org.nz Received: by 10.68.33.230 with HTTP; Sun, 15 Apr 2012 16:21:26 -0700 (PDT) In-Reply-To: References: <20120413064142.10640@gmx.net> Date: Mon, 16 Apr 2012 11:21:26 +1200 X-Google-Sender-Auth: hHocUq7QaMCcc5oR-w3m05eTWr8 Message-ID: From: Andrew Thompson To: Hajimu UMEMOTO Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmUbR0HJjqksMthX6jvP+q4HwJ/sVkxy1WedqiF83PCAPcLbqVFSIXgC2mqC2EYATwid/CV Cc: freebsd-net@freebsd.org, Rainer Bredehorn Subject: Re: getifaddrs & ipv6 scope X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Apr 2012 23:21:27 -0000 On 14 April 2012 06:03, Hajimu UMEMOTO wrote: > Hi, > >>>>>> On Fri, 13 Apr 2012 20:01:39 +1200 >>>>>> Andrew Thompson said: > > thompsa> On 13 April 2012 18:41, Rainer Bredehorn wrot= e: >> Hi! >> >>> I have noticed that getifaddrs() does not have sin6_scope_id set to >>> the interface id for link local addresses on AF_INET6 types. Running >>> the following program gives different results on Linux >> >> ifconfig shows the scopeid according to the interface: >> >> inet6 fe80::208:9bff:fe13:784e%fxp1 prefixlen 64 scopeid 0x2 >> >> Are you talking about the scope value of an multicast address or >> the scopeid for link local addresses? > > thompsa> I am talking about the scopeid for link local addresses which (a= s far > thompsa> as I understand) is the interface index. > > The issue you mentioned comes from an implementation decision of the > KAME IPv6 stack. > The attached patch should address it. =A0However, it may break the > applications which expect that getifaddrs() returns a link-local > address with KAME's embeded scopeid representation. =A0I'm not sure > there are such applications, for now. This is now working how I expected it. From my original test app, dev: bge0 address: scope 2 dev: lo0 address: <::1> scope 0 dev: lo0 address: scope 5 dev: tun5 address: scope 6 dev: tun3 address: scope 7 dev: tun0 address: scope 8 regards, Andrew From owner-freebsd-net@FreeBSD.ORG Mon Apr 16 03:59:36 2012 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7EDE8106564A; Mon, 16 Apr 2012 03:59:36 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 52DAB8FC15; Mon, 16 Apr 2012 03:59:36 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3G3xabY086538; Mon, 16 Apr 2012 03:59:36 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3G3xaQq086534; Mon, 16 Apr 2012 03:59:36 GMT (envelope-from linimon) Date: Mon, 16 Apr 2012 03:59:36 GMT Message-Id: <201204160359.q3G3xaQq086534@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-i386@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/166894: [rl] Realtek RTL8100 keeps droping link X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2012 03:59:36 -0000 Old Synopsis: Realtek RTL8100 keeps droping link New Synopsis: [rl] Realtek RTL8100 keeps droping link Responsible-Changed-From-To: freebsd-i386->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Mon Apr 16 03:59:18 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=166894 From owner-freebsd-net@FreeBSD.ORG Mon Apr 16 11:07:23 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D33511065674 for ; Mon, 16 Apr 2012 11:07:23 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BCA6F8FC17 for ; Mon, 16 Apr 2012 11:07:23 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3GB7NY6022467 for ; Mon, 16 Apr 2012 11:07:23 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3GB7Nbr022465 for freebsd-net@FreeBSD.org; Mon, 16 Apr 2012 11:07:23 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 16 Apr 2012 11:07:23 GMT Message-Id: <201204161107.q3GB7Nbr022465@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2012 11:07:23 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/166909 net [alc] NIC alc(4) does not support 1000baseTX o kern/166894 net [rl] Realtek RTL8100 keeps droping link o kern/166727 net [msk] msk driver keeps erroring o kern/166724 net [re] if_re watchdog timeout o kern/166550 net [netinet] [patch] Some log lines about arp do not incl o kern/166462 net [gre] gre(4) when using a tunnel source address from c o kern/166372 net [patch] ipfilter drops UDP packets with zero checksum o kern/166285 net [arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net [net] [patch] It should be possible to disable "promis o kern/165963 net [panic] [ipf] ipfilter/nat NULL pointer deference o kern/165903 net mbuf leak o kern/165863 net [panic] [netinet] [patch] in_lltable_prefix_free() rac o kern/165643 net [net] [patch] Missing vnet restores in net/if_ethersub o kern/165622 net [ndis][panic][patch] Unregistered use of FPU in kernel s kern/165562 net [request] add support for Intel i350 in FreeBSD 7.4 o kern/165526 net [bxe] UDP packets checksum calculation whithin if_bxe o kern/165488 net [ppp] [panic] Fatal trap 12 jails and ppp , kernel wit o kern/165305 net [ip6] [request] Feature parity between IP_TOS and IPV6 o kern/165296 net [vlan] [patch] Fix EVL_APPLY_VLID, update EVL_APPLY_PR o kern/165181 net [igb] igb freezes after about 2 weeks of uptime o kern/165174 net [patch] [tap] allow tap(4) to keep its address on clos o kern/165152 net [ip6] Does not work through the issue of ipv6 addresse o kern/164569 net [msk] [hang] msk network driver cause freeze in FreeBS o kern/164495 net [igb] connect double head igb to switch cause system t o kern/164490 net [pfil] Incorrect IP checksum on pfil pass from ip_outp o kern/164475 net [gre] gre misses RUNNING flag after a reboot o kern/164400 net [ipsec] immediate crash after the start of ipsec proce o kern/164265 net [netinet] [patch] tcp_lro_rx computes wrong checksum i o kern/163903 net [igb] "igb0:tx(0)","bpf interface lock" v2.2.5 9-STABL o kern/163481 net freebsd do not add itself to ping route packet o kern/162927 net [tun] Modem-PPP error ppp[1538]: tun0: Phase: Clearing o kern/162926 net [ipfilter] Infinite loop in ipfilter with fragmented I o kern/162558 net [dummynet] [panic] seldom dummynet panics o kern/162509 net [re] [panic] Kernel panic may be related to if_re.c (r o kern/162153 net [em] intel em driver 7.2.4 don't compile o kern/162110 net [igb] [panic] RELENG_9 panics on boot in IGB driver - o kern/162028 net [ixgbe] [patch] misplaced #endif in ixgbe.c o kern/161381 net [re] RTL8169SC - re0: PHY write failed o kern/161277 net [em] [patch] BMC cannot receive IPMI traffic after loa o kern/160873 net [igb] igb(4) from HEAD fails to build on 7-STABLE o kern/160750 net Intel PRO/1000 connection breaks under load until rebo o kern/160693 net [gif] [em] Multicast packet are not passed from GIF0 t o kern/160420 net [msk] phy write timeout on HP 5310m o kern/160293 net [ieee80211] ppanic] kernel panic during network setup o kern/160206 net [gif] gifX stops working after a while (IPv6 tunnel) o kern/159817 net [udp] write UDPv4: No buffer space available (code=55) o kern/159629 net [ipsec] [panic] kernel panic with IPsec in transport m o kern/159621 net [tcp] [panic] panic: soabort: so_count o kern/159603 net [netinet] [patch] in_ifscrubprefix() - network route c o kern/159601 net [netinet] [patch] in_scrubprefix() - loopback route re o kern/159294 net [em] em watchdog timeouts o kern/159203 net [wpi] Intel 3945ABG Wireless LAN not support IBSS o kern/158930 net [bpf] BPF element leak in ifp->bpf_if->bif_dlist o kern/158726 net [ip6] [patch] ICMPv6 Router Announcement flooding limi o kern/158694 net [ix] [lagg] ix0 is not working within lagg(4) o kern/158665 net [ip6] [panic] kernel pagefault in in6_setscope() o kern/158635 net [em] TSO breaks BPF packet captures with em driver f kern/157802 net [dummynet] [panic] kernel panic in dummynet o kern/157785 net amd64 + jail + ipfw + natd = very slow outbound traffi o kern/157429 net [re] Realtek RTL8169 doesn't work with re(4) o kern/157418 net [em] em driver lockup during boot on Supermicro X9SCM- o kern/157410 net [ip6] IPv6 Router Advertisements Cause Excessive CPU U o kern/157287 net [re] [panic] INVARIANTS panic (Memory modified after f o kern/157209 net [ip6] [patch] locking error in rip6_input() (sys/netin o kern/157200 net [network.subr] [patch] stf(4) can not communicate betw o kern/157182 net [lagg] lagg interface not working together with epair o kern/156877 net [dummynet] [panic] dummynet move_pkt() null ptr derefe o kern/156667 net [em] em0 fails to init on CURRENT after March 17 o kern/156408 net [vlan] Routing failure when using VLANs vs. Physical e o kern/156328 net [icmp]: host can ping other subnet but no have IP from o kern/156317 net [ip6] Wrong order of IPv6 NS DAD/MLD Report o kern/156283 net [ip6] [patch] nd6_ns_input - rtalloc_mpath does not re o kern/156279 net [if_bridge][divert][ipfw] unable to correctly re-injec o kern/156226 net [lagg]: failover does not announce the failover to swi o kern/156030 net [ip6] [panic] Crash in nd6_dad_start() due to null ptr o kern/155772 net ifconfig(8): ioctl (SIOCAIFADDR): File exists on direc o kern/155680 net [multicast] problems with multicast s kern/155642 net [request] Add driver for Realtek RTL8191SE/RTL8192SE W o kern/155597 net [panic] Kernel panics with "sbdrop" message o kern/155420 net [vlan] adding vlan break existent vlan o kern/155177 net [route] [panic] Panic when inject routes in kernel o kern/155030 net [igb] igb(4) DEVICE_POLLING does not work with carp(4) o kern/155010 net [msk] ntfs-3g via iscsi using msk driver cause kernel o kern/154943 net [gif] ifconfig gifX create on existing gifX clears IP s kern/154851 net [request]: Port brcm80211 driver from Linux to FreeBSD o kern/154850 net [netgraph] [patch] ng_ether fails to name nodes when t o kern/154679 net [em] Fatal trap 12: "em1 taskq" only at startup (8.1-R o kern/154600 net [tcp] [panic] Random kernel panics on tcp_output o kern/154557 net [tcp] Freeze tcp-session of the clients, if in the gat o kern/154443 net [if_bridge] Kernel module bridgestp.ko missing after u o kern/154286 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/154255 net [nfs] NFS not responding o kern/154214 net [stf] [panic] Panic when creating stf interface o kern/154185 net race condition in mb_dupcl o kern/154169 net [multicast] [ip6] Node Information Query multicast add o kern/154134 net [ip6] stuck kernel state in LISTEN on ipv6 daemon whic o kern/154091 net [netgraph] [panic] netgraph, unaligned mbuf? o conf/154062 net [vlan] [patch] change to way of auto-generatation of v o kern/153937 net [ral] ralink panics the system (amd64 freeBSDD 8.X) wh o kern/153936 net [ixgbe] [patch] MPRC workaround incorrectly applied to o kern/153816 net [ixgbe] ixgbe doesn't work properly with the Intel 10g o kern/153772 net [ixgbe] [patch] sysctls reference wrong XON/XOFF varia o kern/153497 net [netgraph] netgraph panic due to race conditions o kern/153454 net [patch] [wlan] [urtw] Support ad-hoc and hostap modes o kern/153308 net [em] em interface use 100% cpu o kern/153244 net [em] em(4) fails to send UDP to port 0xffff o kern/152893 net [netgraph] [panic] 8.2-PRERELEASE panic in netgraph o kern/152853 net [em] tftpd (and likely other udp traffic) fails over e o kern/152828 net [em] poor performance on 8.1, 8.2-PRE o kern/152569 net [net]: Multiple ppp connections and routing table prob o kern/152235 net [arp] Permanent local ARP entries are not properly upd o kern/152141 net [vlan] [patch] encapsulate vlan in ng_ether before out o kern/152036 net [libc] getifaddrs(3) returns truncated sockaddrs for n o kern/151690 net [ep] network connectivity won't work until dhclient is o kern/151681 net [nfs] NFS mount via IPv6 leads to hang on client with o kern/151593 net [igb] [panic] Kernel panic when bringing up igb networ o kern/150920 net [ixgbe][igb] Panic when packets are dropped with heade o kern/150557 net [igb] igb0: Watchdog timeout -- resetting o kern/150251 net [patch] [ixgbe] Late cable insertion broken o kern/150249 net [ixgbe] Media type detection broken o bin/150224 net ppp(8) does not reassign static IP after kill -KILL co f kern/149969 net [wlan] [ral] ralink rt2661 fails to maintain connectio o kern/149937 net [ipfilter] [patch] kernel panic in ipfilter IP fragmen o kern/149643 net [rum] device not sending proper beacon frames in ap mo o kern/149609 net [panic] reboot after adding second default route o kern/149117 net [inet] [patch] in_pcbbind: redundant test o kern/149086 net [multicast] Generic multicast join failure in 8.1 o kern/148018 net [flowtable] flowtable crashes on ia64 o kern/147912 net [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 11 o kern/147894 net [ipsec] IPv6-in-IPv4 does not work inside an ESP-only o kern/147155 net [ip6] setfb not work with ipv6 o kern/146845 net [libc] close(2) returns error 54 (connection reset by f kern/146792 net [flowtable] flowcleaner 100% cpu's core load o kern/146719 net [pf] [panic] PF or dumynet kernel panic o kern/146534 net [icmp6] wrong source address in echo reply o kern/146427 net [mwl] Additional virtual access points don't work on m o kern/146426 net [mwl] 802.11n rates not possible on mwl o kern/146425 net [mwl] mwl dropping all packets during and after high u f kern/146394 net [vlan] IP source address for outgoing connections o bin/146377 net [ppp] [tun] Interface doesn't clear addresses when PPP o kern/146358 net [vlan] wrong destination MAC address o kern/146165 net [wlan] [panic] Setting bssid in adhoc mode causes pani o kern/146082 net [ng_l2tp] a false invaliant check was performed in ng_ o kern/146037 net [panic] mpd + CoA = kernel panic o kern/145825 net [panic] panic: soabort: so_count o kern/145728 net [lagg] Stops working lagg between two servers. p kern/145600 net TCP/ECN behaves different to CE/CWR than ns2 reference f kern/144917 net [flowtable] [panic] flowtable crashes system [regressi o kern/144882 net MacBookPro =>4.1 does not connect to BSD in hostap wit o kern/144874 net [if_bridge] [patch] if_bridge frees mbuf after pfil ho o conf/144700 net [rc.d] async dhclient breaks stuff for too many people o kern/144616 net [nat] [panic] ip_nat panic FreeBSD 7.2 f kern/144315 net [ipfw] [panic] freebsd 8-stable reboot after add ipfw o kern/144231 net bind/connect/sendto too strict about sockaddr length o kern/143846 net [gif] bringing gif3 tunnel down causes gif0 tunnel to s kern/143673 net [stf] [request] there should be a way to support multi s kern/143666 net [ip6] [request] PMTU black hole detection not implemen o kern/143622 net [pfil] [patch] unlock pfil lock while calling firewall o kern/143593 net [ipsec] When using IPSec, tcpdump doesn't show outgoin o kern/143591 net [ral] RT2561C-based DLink card (DWL-510) fails to work o kern/143208 net [ipsec] [gif] IPSec over gif interface not working o kern/143034 net [panic] system reboots itself in tcp code [regression] o kern/142877 net [hang] network-related repeatable 8.0-STABLE hard hang o kern/142774 net Problem with outgoing connections on interface with mu o kern/142772 net [libc] lla_lookup: new lle malloc failed o kern/142018 net [iwi] [patch] Possibly wrong interpretation of beacon- o kern/141861 net [wi] data garbled with WEP and wi(4) with Prism 2.5 f kern/141741 net Etherlink III NIC won't work after upgrade to FBSD 8, o kern/140742 net rum(4) Two asus-WL167G adapters cannot talk to each ot o kern/140682 net [netgraph] [panic] random panic in netgraph o kern/140634 net [vlan] destroying if_lagg interface with if_vlan membe o kern/140619 net [ifnet] [patch] refine obsolete if_var.h comments desc o kern/140346 net [wlan] High bandwidth use causes loss of wlan connecti o kern/140142 net [ip6] [panic] FreeBSD 7.2-amd64 panic w/IPv6 o kern/140066 net [bwi] install report for 8.0 RC 2 (multiple problems) o kern/139565 net [ipfilter] ipfilter ioctl SIOCDELST broken o kern/139387 net [ipsec] Wrong lenth of PF_KEY messages in promiscuous o bin/139346 net [patch] arp(8) add option to remove static entries lis o kern/139268 net [if_bridge] [patch] allow if_bridge to forward just VL p kern/139204 net [arp] DHCP server replies rejected, ARP entry lost bef o kern/139117 net [lagg] + wlan boot timing (EBUSY) o kern/139058 net [ipfilter] mbuf cluster leak on FreeBSD 7.2 o kern/138850 net [dummynet] dummynet doesn't work correctly on a bridge o kern/138782 net [panic] sbflush_internal: cc 0 || mb 0xffffff004127b00 o kern/138688 net [rum] possibly broken on 8 Beta 4 amd64: able to wpa a o kern/138678 net [lo] FreeBSD does not assign linklocal address to loop o kern/138620 net [lagg] [patch] lagg port bpf-writes blocked o kern/138407 net [gre] gre(4) interface does not come up after reboot o kern/138332 net [tun] [lor] ifconfig tun0 destroy causes LOR if_adata/ o kern/138266 net [panic] kernel panic when udp benchmark test used as r o kern/138177 net [ipfilter] FreeBSD crashing repeatedly in ip_nat.c:257 f kern/138029 net [bpf] [panic] periodically kernel panic and reboot o kern/137881 net [netgraph] [panic] ng_pppoe fatal trap 12 p bin/137841 net [patch] wpa_supplicant(8) cannot verify SHA256 signed p kern/137776 net [rum] panic in rum(4) driver on 8.0-BETA2 o bin/137641 net ifconfig(8): various problems with "vlan_device.vlan_i o kern/137392 net [ip] [panic] crash in ip_nat.c line 2577 o kern/137372 net [ral] FreeBSD doesn't support wireless interface from o kern/137089 net [lagg] lagg falsely triggers IPv6 duplicate address de o bin/136994 net [patch] ifconfig(8) print carp mac address o kern/136911 net [netgraph] [panic] system panic on kldload ng_bpf.ko t o kern/136618 net [pf][stf] panic on cloning interface without unit numb o kern/135502 net [periodic] Warning message raised by rtfree function i o kern/134583 net [hang] Machine with jail freezes after random amount o o kern/134531 net [route] [panic] kernel crash related to routes/zebra o kern/134157 net [dummynet] dummynet loads cpu for 100% and make a syst o kern/133969 net [dummynet] [panic] Fatal trap 12: page fault while in o kern/133968 net [dummynet] [panic] dummynet kernel panic o kern/133736 net [udp] ip_id not protected ... o kern/133595 net [panic] Kernel Panic at pcpu.h:195 o kern/133572 net [ppp] [hang] incoming PPTP connection hangs the system o kern/133490 net [bpf] [panic] 'kmem_map too small' panic on Dell r900 o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre f kern/133213 net arp and sshd errors on 7.1-PRERELEASE o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o conf/132851 net [patch] rc.conf(5): allow to setfib(1) for service run o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132554 net [ipl] There is no ippool start script/ipfilter magic t o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o kern/131601 net [ipfilter] [panic] 7-STABLE panic in nat_finalise (tcp o bin/131567 net [socket] [patch] Update for regression/sockets/unix_cm o bin/131365 net route(8): route add changes interpretation of network f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o conf/130555 net [rc.d] [patch] No good way to set ipfilter variables a o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129517 net [ipsec] [panic] double fault / stack overflow f kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o bin/128954 net ifconfig(8) deletes valid routes o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by p kern/127360 net [socket] TOE socket options missing from sosetopt() o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126339 net [ipw] ipw driver drops the connection o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124225 net [ndis] [patch] ndis network driver sometimes loses net o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o kern/123758 net [panic] panic while restarting net/freenet6 o bin/123633 net ifconfig(8) doesn't set inet and ether address in one o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices f kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup ieee o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] [security] ppp(8): fix local stack overflow in o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/120966 net [rum] kernel panic with if_rum and WPA encryption o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr o kern/118727 net [netgraph] [patch] [request] add new ng_pf module o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o kern/113432 net [ucom] WARNING: attempt to net_add_domain(netgraph) af o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111537 net [inet6] [patch] ip6_input() treats mbuf cluster wrong o kern/111457 net [ral] ral(4) freeze o kern/110284 net [if_ethersubr] Invalid Assumption in SIOCSIFADDR in et o kern/110249 net [kernel] [regression] [patch] setsockopt() error regre o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106444 net [netgraph] [panic] Kernel Panic on Binding to an ip to o kern/106438 net [ipf] ipfilter: keep state does not seem to allow repl o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o kern/102540 net [netgraph] [patch] supporting vlan(4) by ng_fec(4) o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working o kern/97306 net [netgraph] NG_L2TP locks after connection with failed o conf/97014 net [gif] gifconfig_gif? in rc.conf does not recognize IPv f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging s kern/90086 net [hang] 5.4p8 on supermicro P8SCT hangs during boot if o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/87421 net [netgraph] [panic]: ng_ether + ng_eiface + if_bridge s kern/86920 net [ndis] ifconfig: SIOCS80211: Invalid argument [regress o kern/86871 net [tcp] [patch] allocation logic for PCBs in TIME_WAIT s o kern/86427 net [lor] Deadlock with FASTIPSEC and nat o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ p kern/85320 net [gre] [patch] possible depletion of kernel stack in ip o bin/82975 net route change does not parse classfull network as given o kern/82881 net [netgraph] [panic] ng_fec(4) causes kernel panic after o kern/82468 net Using 64MB tcp send/recv buffers, trafficflow stops, i o bin/82185 net [patch] ndp(8) can delete the incorrect entry o kern/81095 net IPsec connection stops working if associated network i o kern/78968 net FreeBSD freezes on mbufs exhaustion (network interface o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if o kern/77341 net [ip6] problems with IPV6 implementation s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time a kern/71474 net [route] route lookup does not skip interfaces marked d o kern/71469 net default route to internet magically disappears with mu o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/68889 net [panic] m_copym, length > size of mbuf chain o kern/66225 net [netgraph] [patch] extend ng_eiface(4) control message o kern/65616 net IPSEC can't detunnel GRE packets after real ESP encryp s kern/60293 net [patch] FreeBSD arp poison patch a kern/56233 net IPsec tunnel (ESP) over IPv6: MTU computation is wrong s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr s kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". o kern/31940 net ip queue length too short for >500kpps o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c f kern/24959 net [patch] proper TCP_NOPUSH/TCP_CORK compatibility o conf/23063 net [arp] [patch] for static ARP tables in rc.network o kern/21998 net [socket] [patch] ident only for outgoing connections o kern/5877 net [socket] sb_cc counts control data as well as data dat 404 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Apr 16 19:26:50 2012 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 844AA106567D; Mon, 16 Apr 2012 19:26:50 +0000 (UTC) (envelope-from jinmei@isc.org) Received: from mx.pao1.isc.org (mx.pao1.isc.org [IPv6:2001:4f8:0:2::2b]) by mx1.freebsd.org (Postfix) with ESMTP id 62FB68FC18; Mon, 16 Apr 2012 19:26:50 +0000 (UTC) Received: from bikeshed.isc.org (bikeshed.isc.org [IPv6:2001:4f8:3:d::19]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mail.isc.org", Issuer "RapidSSL CA" (not verified)) by mx.pao1.isc.org (Postfix) with ESMTPS id 826A0C9463; Mon, 16 Apr 2012 19:26:42 +0000 (UTC) (envelope-from jinmei@isc.org) Received: from jmb.jinmei.org (unknown [IPv6:2001:4f8:3:64:a5cb:705b:2abc:7132]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by bikeshed.isc.org (Postfix) with ESMTPSA id 45BDB216C33; Mon, 16 Apr 2012 19:26:42 +0000 (UTC) (envelope-from jinmei@isc.org) Date: Mon, 16 Apr 2012 12:26:41 -0700 Message-ID: From: JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= To: "Bjoern A. Zeeb" In-Reply-To: <7153D609-0E71-435A-B076-27BD6C3AEA04@lists.zabbadoz.net> References: <20120413064142.10640@gmx.net> <7153D609-0E71-435A-B076-27BD6C3AEA04@lists.zabbadoz.net> User-Agent: Wanderlust/2.14.0 (Africa) Emacs/22.1 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mx.pao1.isc.org Cc: freebsd-net@FreeBSD.org, Andrew Thompson , Hajimu UMEMOTO , Rainer Bredehorn Subject: Re: getifaddrs & ipv6 scope X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2012 19:26:50 -0000 At Sat, 14 Apr 2012 16:41:52 +0000, "Bjoern A. Zeeb" wrote: > > The issue you mentioned comes from an implementation decision of the > > KAME IPv6 stack. > > The attached patch should address it. However, it may break the > > applications which expect that getifaddrs() returns a link-local > > address with KAME's embeded scopeid representation. I'm not sure > > there are such applications, for now. > > There should be none. If we have some we should fix them. I wonder if this should actually be done in the kernel to limit the scope of the embedded scopeid to our kernel for now? Do we have other interfaces (ignoring kvm) that export the embedded scopeid? I suspect most of routing related implementations (most notably, route6d and quagga) still rely on the embedded scope (zone) ID for the data exchanged over a routing socket. --- JINMEI, Tatuya Internet Systems Consortium, Inc. From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 04:17:19 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DC125106564A for ; Tue, 17 Apr 2012 04:17:19 +0000 (UTC) (envelope-from achilov-rn@askd.ru) Received: from master.askd.ru (master.askd.ru [80.242.75.6]) by mx1.freebsd.org (Postfix) with ESMTP id 26F9A8FC0A for ; Tue, 17 Apr 2012 04:17:18 +0000 (UTC) Received: from sentry (sentry [192.168.1.94]) by master.askd.ru (8.14.5/8.14.5) with ESMTP id q3H4FdXs009969 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 17 Apr 2012 11:15:43 +0700 (NOVT) (envelope-from achilov-rn@askd.ru) From: "Rashid N. Achilov" Organization: =?koi8-r?b?7+/v?= "=?koi8-r?b?4fMt88nT1MXNwQ==?= =?koi8-r?b?IOvPzdDMxcvT?=" To: freebsd-net@freebsd.org Date: Tue, 17 Apr 2012 11:15:38 +0700 User-Agent: KMail/1.9.10 MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201204171115.38725.achilov-rn@askd.ru> Subject: Support for Intel I350T2 server adapter X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "Achilov, Rashid" List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 04:17:19 -0000 Is there FreeBSD 8.2 supported Intel I350T2 dual-port server adapter? Intel proposed drivers for 7.x, hardware compatibility for 8.2 does not mention anything, Google informs about support in 9.0. I'm totally mess... -- With Best Regards. Rashid N. Achilov (RNA1-RIPE), JID: citycat4@jabber.infos.ru OOO "ACK" telecommunications administrator, e-mail: achilov-rn [at] askd.ru PGP: 83 CD E2 A7 37 4A D5 81 D6 D6 52 BF C9 2F 85 AF 97 BE CB 0A From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 11:06:03 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94BB8106564A for ; Tue, 17 Apr 2012 11:06:03 +0000 (UTC) (envelope-from dmk.sbor@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4E1D18FC17 for ; Tue, 17 Apr 2012 11:06:03 +0000 (UTC) Received: by yhgm50 with SMTP id m50so3516844yhg.13 for ; Tue, 17 Apr 2012 04:05:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JEWUB2eODJBGgets7DQaJKwIufhvkmZLWrgwjCCjYTg=; b=WQ0/MWNlc/egk6qpHIGuA+uOa84M+UkNOxyNO0VqQGIPEQLWX8aq6tZP8bk7/UH+ou owu/2QK+bDgfHxBQ3lAP9Es9zP8wSoktkhJU33U1vaO9JETusaJORGQGb+RwCMJNoZLA efOcoMWbi2h1VHv3KsAQKkqZuwd2PujT/w+Nv3rwgy6/9v1mebHCT2wGMqMCqVy+NXnt yqZKEvtUqNHmj9xc/nrcTzKN+6wWePRmRqpxhaPSLekL+50HPY/MSUqUVbtulccCFaGk ynuEkPDbFosgpYPMDCkTJ5Gh85JajulNUDSXqQzXnABlvN7TBwOLQByDpFbfZxumO+bo tCAA== MIME-Version: 1.0 Received: by 10.236.185.138 with SMTP id u10mr14629893yhm.106.1334660757326; Tue, 17 Apr 2012 04:05:57 -0700 (PDT) Received: by 10.146.168.1 with HTTP; Tue, 17 Apr 2012 04:05:57 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Apr 2012 15:05:57 +0400 Message-ID: From: "Dmitry S. Kasterin" To: freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 Subject: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 11:06:03 -0000 (Cross-posting this to net@ since there was no reply on ipfw@.) Hello! I have rather simple ipfw ruleset like this: 00001 allow all from any to any via lo0 00010 check-state 00101 allow tcp from me to any out setup keep-state 65533 deny log ip from any to any 65534 deny ip6 from any to any Actually, there are a few rules for upd, icmp and so on, but the main idea here is to allow only outgoing (tcp) connections and handle them using dynamic rules. The first thing I have found was enormously high counter value on "deny log ip from any to any" rule. For that moment my workstation was placed in a small private (and "clean") network, so this value was considered suspicious. Later I've discovered that many tcp connections have FIN_WAIT_2 or LAST_ACK state. In order to determine what's going on I've carried out some experiments (shown below). Briefly: from my point of view, ipfw sometimes handles the last phase of a connection improperly. I was unable to reliably reproduce this behaviour - sometimes it happens, but in the most cases not. But when it happens, it leads to "frozen" connections. Of course, this can be just a symptom of software misconfiguration or maybe my mistake. So I need an opinion from people with deep knoweledge of ipfw and network stack. Any information or comments are much appreciated! PS I'm running 9.0-STABLE with custom kernel. I. 1) Flush ipfw, reset counters, load fresh ruleset from file. 2) Run tcpdump on network interface (e.g. re0) and ipfw log interface (ipfw0) # tcpdump -i re0 -p -w # tcpdump -i ipfw0 -p -w 3) Disable proxy and make a query to a webserver, e.g. $ lynx www.freebsd.org 4) Check ipfw counter and connections # ipfw -deS show ; netstat -n -p tcp In the most cases this test gives "normal" result. But under some circumstances the result may be like this (w.x.y.z is an IP address of my workstation): # ipfw -deS show ; netstat -n -p tcp 00001 0 0 set 0 allow ip from any to any via lo0 ... 00010 0 0 set 0 check-state 00101 47 28622 set 0 allow tcp from me to any out setup keep-state ... 65533 6 312 set 0 deny log logamount 16 ip from any to any ## Dynamic rules (1): 00101 13 5620 (0s) STATE tcp w.x.y.z 26051 <-> 69.147.83.34 80 Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 w.x.y.z.13414 69.147.83.34.80 LAST_ACK So, the page (www.freebsd.org) was loaded and packets were counted. But the dynamic rule entry is invalid - it has port 26051 instead of 13414. The analysis of dump files has shown: a) Dump from re0 has only one TCP stream: w.x.y.z:13414 <-> 69.147.83.34:80 b) Dump from ipfw0: N Time Source Destination Protocol Length Info 1 0.000000 69.147.83.34 w.x.y.z TCP 66 http > 13414 [ACK] Seq=1 Ack=1 Win=8325 Len=0 ... 2 0.557615 w.x.y.z 69.147.83.34 TCP 66 13414 > http [FIN, ACK] Seq=0 Ack=1 Win=1040 Len=0 ... 3 1.947625 w.x.y.z 69.147.83.34 TCP 66 13414 > http [FIN, ACK] Seq=0 Ack=1 Win=1040 Len=0 ... 4 4.527624 w.x.y.z 69.147.83.34 TCP 66 13414 > http [FIN, ACK] Seq=0 Ack=1 Win=1040 Len=0 ... 5 9.487633 w.x.y.z 69.147.83.34 TCP 66 13414 > http [FIN, ACK] Seq=0 Ack=1 Win=1040 Len=0 ... 6 19.207616 w.x.y.z 69.147.83.34 TCP 66 13414 > http [FIN, ACK] Seq=0 Ack=1 Win=1040 Len=0 ... It looks like network stack tries to finalize connection but the corresponding packets are dropped. II. 1) Slightly change the ruleset: 00001 allow ip from any to any via lo0 00010 check-state 00101 allow tcp from me to any out setup keep-state 11001 allow log tcp from any 80 to me in 11002 allow log tcp from me to any dst-port 80 out 65533 deny ip from any to any 65534 deny ip6 from any to any The idea is to see packets which were blocked in the previous test. 2) Flush ipfw, see I. 2) Run tcpdump, see I. 3) $ lynx www.freebsd.org 4) "ipfw -deS show" and "netstat -n -p tcp" # ipfw -deS show 00001 0 0 set 0 allow ip from any to any via lo0 00010 0 0 set 0 check-state 00101 33 22942 set 0 allow tcp from me to any out setup keep-state 11001 2 96 set 1 allow log logamount 16 tcp from any 80 to me in 11002 0 0 set 1 allow log logamount 16 tcp from me to any dst-port 80 out 65533 0 0 set 0 deny ip from any to any An there are no waiting connections. The dump from ipfw0 contains: No. Time Source Destination Protocol Length Info 1 0.000000 69.147.83.34 w.x.y.z TCP 66 http > 29470 [ACK] Seq=1 Ack=1 Win=8325 Len=0 ... III. 1) Change ruleset back to: 00001 allow ip from any to any via lo0 00010 check-state 00101 allow tcp from me to any out setup keep-state 65534 deny ip6 from any to any 65535 deny ip from any to any 2) Browse the Internet. 3) I've visited some pages and now netstat output looks like: # netstat -an -f inet | grep FIN_WAIT tcp4 0 0 w.x.y.z.47536 173.194.32.21.443 FIN_WAIT_2 tcp4 0 0 w.x.y.z.47533 173.194.32.21.443 FIN_WAIT_2 tcp4 0 0 w.x.y.z.47532 173.194.32.21.443 FIN_WAIT_2 tcp4 0 0 w.x.y.z.47531 173.194.32.21.443 FIN_WAIT_2 tcp4 0 0 w.x.y.z.24851 199.7.55.72.80 FIN_WAIT_2 tcp4 0 0 w.x.y.z.24731 74.125.224.79.80 FIN_WAIT_2 tcp4 0 0 w.x.y.z.11578 213.180.204.69.80 FIN_WAIT_2 tcp4 0 0 w.x.y.z.11577 213.180.204.143.80 FIN_WAIT_2 From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 19:48:22 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 56C861065670 for ; Tue, 17 Apr 2012 19:48:22 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id D970D8FC21 for ; Tue, 17 Apr 2012 19:48:21 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so6562716wgb.31 for ; Tue, 17 Apr 2012 12:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=U1imaYM2g+6oQjm6n2ikZuNMCNeyYEM4QZQEVuilhys=; b=gfhoJS59jTE1sP6IT4gvNEynJ7Vr3/R1n1rWUR7xXKAmJmOJAhb4FVp8pe8guklzi+ fJlzzcn17bS35DOUT2N1/5jCIA9lNvuNcRZmC6/BwPTyf2JKqSnVq0lyZKkahRHa0oNc wCEpKTM8/3upVSOzrL7GNNftOD8lHTyZ2ASg67RyD98hNgpTlwssroryx2ruTbFiaMaD 0APMfiNDPDcYaSDMfPjm4o6CpBNPA3pwxzE1bthu8aS/3ZIDnBygAHn63J2hneu/gMdB 8/h3dan799YdajfRSPmeQr4xRh8VxLEs4SPmk0ultdjdaj1bgw5/BBw9F4aZSrF4ETVf 8Udw== MIME-Version: 1.0 Received: by 10.180.105.194 with SMTP id go2mr8262117wib.22.1334692101097; Tue, 17 Apr 2012 12:48:21 -0700 (PDT) Received: by 10.223.54.207 with HTTP; Tue, 17 Apr 2012 12:48:21 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Apr 2012 12:48:21 -0700 Message-ID: From: Kevin Oberman To: "Dmitry S. Kasterin" Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 19:48:22 -0000 On Tue, Apr 17, 2012 at 4:05 AM, Dmitry S. Kasterin wrote: > (Cross-posting this to net@ since there was no reply on ipfw@.) > > Hello! > > I have rather simple ipfw ruleset like this: > > 00001 allow all from any to any via lo0 > > 00010 check-state > 00101 allow tcp from me to any out setup keep-state > > 65533 deny log ip from any to any > 65534 deny ip6 from any to any > > Actually, there are a few rules for upd, icmp and so on, > but the main idea here is to allow only outgoing (tcp) connections > and handle them using dynamic rules. I feel hesitant about sending this as it looks like you may have found a real problem with IPFW. But I do have to ask why you find statefull rules for outgoing TCP connections desirable? Why not: 00101 allow tcp from me to any established It appears to do the same thing for TCP and is much faster to process plus it does not leave you open to trivial DOS (often of yourself) by filling the dynamic rule tables. Generally, for client systems, stateful UDP makes sense, but I generally don't understand why people choose the more complex, slower, and potentially disruptive stateful rules for TCP. -- R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 19:58:55 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B94F8106568D for ; Tue, 17 Apr 2012 19:58:55 +0000 (UTC) (envelope-from kudzu@tenebras.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6D5A48FC22 for ; Tue, 17 Apr 2012 19:58:55 +0000 (UTC) Received: by yenl9 with SMTP id l9so4013162yen.13 for ; Tue, 17 Apr 2012 12:58:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=LlqzvhUaLWayyQVXd6DlFzLcLPLXEmPvbKRaSHXYMh4=; b=WC7GEi9JUWzNsaJyk5rpKvXPtrWspzHo+GCvcaCzE7R37Mg1SiJLMnz+a3iiS1PRyJ IPfKdeNxlR/aa3UpBUI9PEa352mPr1R2V6Ob0ERJKhk8athP7vUQ7sypO1xJQffTKV0P hVtA8rDu96ZghmtNmDrcLCmyYfoTcxUL0QyOHa7V2bJhGM6Vp0yBBQnBLdQ9vdadIR3Q zqlhw6GJ+bhLNNAoRSCr20q9PmC/BoM6pMgrouOaaim768bSM28PhsKbjdT87Hg/IMid CDY++CTnPJdwLcnwM4zDHmOSGG/FvpnkTFZMx4YNc3tAXhmzEd0uOiMht1WBijzA9Vbb ZVTg== MIME-Version: 1.0 Received: by 10.236.175.41 with SMTP id y29mr17004665yhl.60.1334692734998; Tue, 17 Apr 2012 12:58:54 -0700 (PDT) Received: by 10.236.18.135 with HTTP; Tue, 17 Apr 2012 12:58:54 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Apr 2012 12:58:54 -0700 Message-ID: From: Michael Sierchio To: Kevin Oberman X-Gm-Message-State: ALoCoQk5amvmsIqN8BnHosAAb1WqksieGbgnL0+OOJgRn+0UDSZql7SpQV+kcHLMC396uYkrio2y Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, "Dmitry S. Kasterin" Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 19:58:55 -0000 On Tue, Apr 17, 2012 at 12:48 PM, Kevin Oberman wrote: > > But I do have to ask why you find statefull rules for outgoing TCP > connections desirable? Why not: > 00101 allow tcp from me to any established > > It's useful and appropriate to have outbound connections be stateful. It's not a good idea to have inbound connections stateful, as it makes it easy to fill up the state table. To the OP: Look at the kernel tunables: net.inet.ip.fw.dyn_rst_lifetime net.inet.ip.fw.dyn_fin_lifetime net.inet.ip.fw.dyn_syn_lifetime net.inet.ip.fw.dyn_ack_lifetime From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 20:18:01 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 866A2106566B for ; Tue, 17 Apr 2012 20:18:01 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0D69B8FC15 for ; Tue, 17 Apr 2012 20:18:00 +0000 (UTC) Received: by wern13 with SMTP id n13so5636876wer.13 for ; Tue, 17 Apr 2012 13:18:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=BxqmtaBPhOczWT8OE7z3b3txCZ/H3nmXpOzcNSJKv20=; b=dkjHniHtoxNGfpCp5hEX6YA600QKu+89E11Z4sjf0X1/DmvcH7ss+PJd3G7iVGdZVK P+hcXF+DDPt/Fo2Bu4PF3KBboDx5s1AMxZ8vhwHfGzyGKc33+VK4Z6hKQyiQK0yNcPna hHtPhne3ifcMmrlbAATvzz+ilViy5mnKH5vSh5ZafyOVrPS7WSfTLdoiRKGHS57QWV1H e5FPK7kuS/KX0jO4upaJa1JAADNApXv9JasnZjs7kCKfghKzaVdScTfYYItQyTd7YZqD RufBtyFUgINbv9MD5MhNkjXj64AEE4WXxLiIg0pHv/n5XRbmWAQOCBliuE7kTxABBghU 34jQ== MIME-Version: 1.0 Received: by 10.216.132.98 with SMTP id n76mr10024494wei.101.1334693879857; Tue, 17 Apr 2012 13:17:59 -0700 (PDT) Received: by 10.223.54.207 with HTTP; Tue, 17 Apr 2012 13:17:59 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Apr 2012 13:17:59 -0700 Message-ID: From: Kevin Oberman To: Michael Sierchio Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, "Dmitry S. Kasterin" Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 20:18:01 -0000 On Tue, Apr 17, 2012 at 12:58 PM, Michael Sierchio wro= te: > On Tue, Apr 17, 2012 at 12:48 PM, Kevin Oberman wrote= : >> >> >> But I do have to ask why you find statefull rules for outgoing TCP >> connections desirable? Why not: >> 00101 allow tcp from me to any established >> > It's useful and appropriate to have outbound connections be stateful. =A0= It's > not a good idea to have inbound connections stateful, as it makes it easy= to > fill up the state table. It is occasionally useful and appropriate to have outbound connections be stateful. I agree that inbound ones are dangerous, but I have managed to DOS myself on an outbound entry. (Yes, it was dumb and involved some horribly written software that kept opening and closing sockets instead of continuing to re-use them.) There can also be no question that they are more complex and, in most cases offer exactly zero advantage over 'established'. it is often simply an automatic action that involves no thought of which is more appropriate. --=20 R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-net@FreeBSD.ORG Tue Apr 17 23:24:39 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 791E8106564A for ; Tue, 17 Apr 2012 23:24:39 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout2-b.corp.bf1.yahoo.com (mrout2-b.corp.bf1.yahoo.com [98.139.253.105]) by mx1.freebsd.org (Postfix) with ESMTP id 3F7F38FC16 for ; Tue, 17 Apr 2012 23:24:39 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3HNOOXH022380 for ; Tue, 17 Apr 2012 16:24:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334705065; bh=nACyvlnxJ7uDD/XMXoCqZxCo80CjuzDNqjcITxVwjYA=; h=Subject:From:Reply-To:To:Content-Type:Date:Message-ID: Mime-Version:Content-Transfer-Encoding; b=EQlEGOwDKscY72a0MuwR/nuiArNXSZ72QuVfyIEvsFL1fhzXrkgKlk2Zas+MTHFUC 9un9FPro70hJi2YKeJdE54PifwJQE2e4ZhrPKORctlDrMMeFFk34aPeWBT2RXIlO/f uKIIF2uUYf1RkO93ETvJ1QZyR8FiVA2upZfMgwJ8= From: Sean Bruno To: "freebsd-net@freebsd.org" Content-Type: text/plain; charset="UTF-8" Date: Tue, 17 Apr 2012 16:24:24 -0700 Message-ID: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Subject: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: sbruno@freebsd.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Apr 2012 23:24:39 -0000 We're running a service with a 82576 configured with 4 queues and a maxed rxd/txd configuration: http://people.freebsd.org/~sbruno/igb_stats.txt We still see, under higher load spikes, a tendency to drop packets (I suspect an application issue at this point, but want to attempt to alleviate some congestion). I note that IGB_MAX_RXD is set to 4k. Reviewing the Intel spec shell on the 82576 I see that the maximum value for the Descriptor Ring Length (8.10.8) is 32k descriptors. Does that mean I should be able to go higher that 4k? I suspect that even if I can, that would merely make the traffic fill more buffer space, but maybe its enough to make it work a bit better. Sean From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 00:39:49 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95D9B1065674 for ; Wed, 18 Apr 2012 00:39:49 +0000 (UTC) (envelope-from emaste@freebsd.org) Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by mx1.freebsd.org (Postfix) with ESMTP id 5A2B08FC17 for ; Wed, 18 Apr 2012 00:39:49 +0000 (UTC) Received: from labgw2.phaedrus.sandvine.com (192.168.222.22) by WTL-EXCH-1.sandvine.com (192.168.196.31) with Microsoft SMTP Server id 14.1.339.1; Tue, 17 Apr 2012 20:39:42 -0400 Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 10332) id 486BD33C02; Tue, 17 Apr 2012 20:39:40 -0400 (EDT) Date: Tue, 17 Apr 2012 20:39:40 -0400 From: Ed Maste To: Message-ID: <20120418003939.GA32603@sandvine.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Subject: lagg(4) MAC address selection proposal X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 00:39:49 -0000 When a new lagg(4) interface is created the link layer address from the first port in the group is assigned to the lagg and to all other lagg port members. This means the address assigned to the lagg is different if specified as, for example, "laggport em0 laggport em1" vs "laggport em1 laggport em0". The code in lagg_port_create(), in if_lagg.c that chooses the first l2 address: 575 if (SLIST_EMPTY(&sc->sc_ports)) { 576 sc->sc_primary = lp; 577 lagg_lladdr(sc, IF_LLADDR(ifp)); 578 } else { 579 /* Update link layer address for this port */ 580 lagg_port_lladdr(lp, IF_LLADDR(sc->sc_ifp)); 581 } For the current modes lagg supports this probably doesn't matter much, but we have some improvements in the pipeline for which this behaviour is undesirable. (The first of which is an interface for choosing a different master; this allows a failover lagg to be set to transmit on a new port, without changing link states. With the current behaviour this causes all ports in the lagg to then change their l2 address.) In looking into potential solutions I found that the bridgestp code in bridge(4) searches the list of associated MAC addresses and uses the lowest one when it needs to select one from a group. I'd like to propose using the same logic for lagg's MAC address selection. Can anyone foresee an issue with this change? (I'm not aware of any lagg use cases that rely on the current behaviour.) -Ed From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 00:53:43 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E4C27106564A for ; Wed, 18 Apr 2012 00:53:43 +0000 (UTC) (envelope-from andy@fud.org.nz) Received: from mail-pz0-f44.google.com (mail-pz0-f44.google.com [209.85.210.44]) by mx1.freebsd.org (Postfix) with ESMTP id B5BC48FC22 for ; Wed, 18 Apr 2012 00:53:43 +0000 (UTC) Received: by dadz14 with SMTP id z14so29904598dad.17 for ; Tue, 17 Apr 2012 17:53:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding:x-gm-message-state; bh=L+RqwNdLxKvQYOaI1XVu7jescaCmeeaRtuNkysZxdMc=; b=PR71RJ08PpIT1ZlEVQ8H3KDSCxEa8RlCQwaxkFuU6Hl6xEy4Fd+rRabWJjkiJeHBjd jVE1CWW1+J0U2GeOdcEAvsuw8VyWmRz1y6aYM99Qk6Zk6jMYpiiL3hpLqPdc7ZDACsWB iHz0eFPbzsxGQJ+QAuHAQA3SIRHByyPpvECKrw1z868YyNVnY+O/cODEAu4tP43BEAdD Il2S91uNVkcvSJfPwRUQX5P6X0ipt/U3iXlz1/XwklW3cyepWR3Jx46axw+C310eUx0j Odwz4dZPTG3ADTtVzq7rWEXzpPS6/F1gTdmZ8V+73osrfZtqEP2+/cwHCXD/gQWN+UL4 W3Hw== MIME-Version: 1.0 Received: by 10.68.236.132 with SMTP id uu4mr1815403pbc.11.1334710423473; Tue, 17 Apr 2012 17:53:43 -0700 (PDT) Sender: andy@fud.org.nz Received: by 10.68.239.164 with HTTP; Tue, 17 Apr 2012 17:53:43 -0700 (PDT) In-Reply-To: <20120418003939.GA32603@sandvine.com> References: <20120418003939.GA32603@sandvine.com> Date: Wed, 18 Apr 2012 12:53:43 +1200 X-Google-Sender-Auth: k5KccRGZXENtycyVzXV8e0vw8IE Message-ID: From: Andrew Thompson To: Ed Maste , freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlxo+O7zm4RmbUfTbhTm6Y6w/HkdEcOyX/yDi4+BlS79SxGadodPsYCeDT6i+Mjeg6xUkC6 Cc: Subject: Re: lagg(4) MAC address selection proposal X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 00:53:44 -0000 On 18 April 2012 12:39, Ed Maste wrote: > When a new lagg(4) interface is created the link layer address from the > first port in the group is assigned to the lagg and to all other lagg > port members. =A0This means the address assigned to the lagg is different > if specified as, for example, "laggport em0 laggport em1" vs > "laggport em1 laggport em0". > > The code in lagg_port_create(), in if_lagg.c that chooses the first > l2 address: > > =A0 575 =A0if (SLIST_EMPTY(&sc->sc_ports)) { > =A0 576 =A0 =A0 =A0 =A0 =A0sc->sc_primary =3D lp; > =A0 577 =A0 =A0 =A0 =A0 =A0lagg_lladdr(sc, IF_LLADDR(ifp)); > =A0 578 =A0} else { > =A0 579 =A0 =A0 =A0 =A0 =A0/* Update link layer address for this port */ > =A0 580 =A0 =A0 =A0 =A0 =A0lagg_port_lladdr(lp, IF_LLADDR(sc->sc_ifp)); > =A0 581 =A0} > > For the current modes lagg supports this probably doesn't matter much, > but we have some improvements in the pipeline for which this behaviour > is undesirable. =A0(The first of which is an interface for choosing a > different master; this allows a failover lagg to be set to transmit on a > new port, without changing link states. =A0With the current behaviour thi= s > causes all ports in the lagg to then change their l2 address.) > > In looking into potential solutions I found that the bridgestp code in > bridge(4) searches the list of associated MAC addresses and uses the > lowest one when it needs to select one from a group. =A0I'd like to > propose using the same logic for lagg's MAC address selection. =A0Can > anyone foresee an issue with this change? =A0(I'm not aware of any lagg > use cases that rely on the current behaviour.) I do not foresee any issues. What we also need is a event trigger for various pseudo interfaces when the mac or primary interface changes, this would allow arp/nd6 to rebroadcast. Andrew From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 01:00:14 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DCDA106566C; Wed, 18 Apr 2012 01:00:09 +0000 (UTC) (envelope-from emaste@freebsd.org) Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by mx1.freebsd.org (Postfix) with ESMTP id 17EEC8FC0C; Wed, 18 Apr 2012 01:00:09 +0000 (UTC) Received: from labgw2.phaedrus.sandvine.com (192.168.222.22) by WTL-EXCH-1.sandvine.com (192.168.196.31) with Microsoft SMTP Server id 14.1.339.1; Tue, 17 Apr 2012 21:00:08 -0400 Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 10332) id 29AAA33C02; Tue, 17 Apr 2012 21:00:06 -0400 (EDT) Date: Tue, 17 Apr 2012 21:00:05 -0400 From: Ed Maste To: Andrew Thompson Message-ID: <20120418010005.GA89815@sandvine.com> References: <20120418003939.GA32603@sandvine.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Cc: freebsd-net@freebsd.org Subject: Re: lagg(4) MAC address selection proposal X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 01:00:14 -0000 On Wed, Apr 18, 2012 at 12:53:43PM +1200, Andrew Thompson wrote: > What we also need is a event trigger for > various pseudo interfaces when the mac or primary interface changes, > this would allow arp/nd6 to rebroadcast. Yes, we're working something like this specifically for lagg but it should be made more general for all of these cases. In the lagg failover case we need to have a gratuitous arp/nd6 when the link states change. Did you see glebius's message: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=126040+0+archive/2012/freebsd-net/20120212.freebsd-net -Ed From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 07:08:57 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33168106564A; Wed, 18 Apr 2012 07:08:57 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id E1E338FC12; Wed, 18 Apr 2012 07:08:56 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 916837300A; Wed, 18 Apr 2012 09:28:18 +0200 (CEST) Date: Wed, 18 Apr 2012 09:28:18 +0200 From: Luigi Rizzo To: sbruno@freebsd.org Message-ID: <20120418072818.GA58850@onelab2.iet.unipi.it> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> User-Agent: Mutt/1.4.2.3i Cc: "freebsd-net@freebsd.org" Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 07:08:57 -0000 On Tue, Apr 17, 2012 at 04:24:24PM -0700, Sean Bruno wrote: > We're running a service with a 82576 configured with 4 queues and a > maxed rxd/txd configuration: > > http://people.freebsd.org/~sbruno/igb_stats.txt these stats show that over half of your incoming traffic is made of small packets (65..127 bytes) but especially, that the "missed packets" count is very small (18k out of 40G packets) none of them is reported as "no_desc_avail", and only 76 are "recv_no_buffer". Are you dropping packets in the ip interrupt handler by chance ? what are your settings there ? BTW it seems that there is only one global setting for the dispatch policy, but for instance there are two netisr_dispatch() calls in the incoming path, one for layer2 and one for layer3. The former has relatively little work to do and so it might make sense to have direct dispatch, the other can be expensive so i wonder if it wouldn't be better to use deferred dispatch. If not, perhaps you might try to reduce the rx_processing_limit to bring down the load on the intr thread. > sysctl net | grep intr net.inet.ip.intr_queue_maxlen: 256 net.inet.ip.intr_queue_drops: 253 > sysctl net.isr net.isr.numthreads: 1 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 1 net.isr.direct: 0 net.isr.direct_force: 0 net.isr.dispatch: direct > We still see, under higher load spikes, a tendency to drop packets (I > suspect an application issue at this point, but want to attempt to > alleviate some congestion). > > I note that IGB_MAX_RXD is set to 4k. Reviewing the Intel spec shell on > the 82576 I see that the maximum value for the Descriptor Ring Length > (8.10.8) is 32k descriptors. > > Does that mean I should be able to go higher that 4k? I suspect that > even if I can, that would merely make the traffic fill more buffer > space, but maybe its enough to make it work a bit better. With your numbers i doubt that raising the queue size helps. cheers luigi > Sean > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 15:26:46 2012 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8BDD1065672; Wed, 18 Apr 2012 15:26:46 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A98A28FC15; Wed, 18 Apr 2012 15:26:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3IFQkFS055488; Wed, 18 Apr 2012 15:26:46 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3IFQkQm055484; Wed, 18 Apr 2012 15:26:46 GMT (envelope-from linimon) Date: Wed, 18 Apr 2012 15:26:46 GMT Message-Id: <201204181526.q3IFQkQm055484@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/167059: [tcp] [panic] System does panic in in_pcbbind() and hangs X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 15:26:46 -0000 Old Synopsis: System does panic in in_pcbbind() and hangs New Synopsis: [tcp] [panic] System does panic in in_pcbbind() and hangs Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Wed Apr 18 15:26:23 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=167059 From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 16:38:08 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB1BB106566B for ; Wed, 18 Apr 2012 16:38:08 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1.freebsd.org (Postfix) with ESMTP id 714638FC0C for ; Wed, 18 Apr 2012 16:38:08 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3IGRJVJ024114; Wed, 18 Apr 2012 09:27:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334766440; bh=g/IgYGod9Cwy0RuUlia7PRGxd9uh2Ij8iOwKfrYjjmI=; h=Subject:From:Reply-To:To:Cc:In-Reply-To:References:Content-Type: Date:Message-ID:Mime-Version; b=fJEdmM7z0kTc0zoQh+0Bb/kHuDEOFnSwWdj9ZHC5MhRksva6tN//INLlUzWjoYvlt tnjg6cIMmkKYBT9IojpiIK2wIyJeJj28AYGJhfNeBGDbUKv1i4aIgmKj7gYxHjgkk1 ubFPSoPyT4GPYqF9eJZxOG0NuPbf7dvk8f+/9GZg= From: Sean Bruno To: Luigi Rizzo In-Reply-To: <20120418072818.GA58850@onelab2.iet.unipi.it> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <20120418072818.GA58850@onelab2.iet.unipi.it> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-fZwHBsYnQWqPnYlo+ahK" Date: Wed, 18 Apr 2012 09:27:18 -0700 Message-ID: <1334766438.3466.4.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 766439004 Cc: "freebsd-net@freebsd.org" , Jack Vogel Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "sbruno@freebsd.org" List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 16:38:08 -0000 --=-fZwHBsYnQWqPnYlo+ahK Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2012-04-18 at 00:28 -0700, Luigi Rizzo wrote: > On Tue, Apr 17, 2012 at 04:24:24PM -0700, Sean Bruno wrote: > > We're running a service with a 82576 configured with 4 queues and a > > maxed rxd/txd configuration: > >=20 > > http://people.freebsd.org/~sbruno/igb_stats.txt >=20 > these stats show that over half of your incoming traffic is > made of small packets (65..127 bytes) but especially, that > the "missed packets" count is very small (18k out of 40G packets) > none of them is reported as "no_desc_avail", and only 76 are > "recv_no_buffer". >=20 > Are you dropping packets in the ip interrupt handler by chance ? > what are your settings there ? >=20 nope, doesn't look like it. =20 http://people.freebsd.org/~sbruno/igb_ip_stats.txt > BTW it seems that there is only one global setting for the dispatch > policy, but for instance there are two netisr_dispatch() calls > in the incoming path, one for layer2 and one for layer3. > The former has relatively little work to do and so it might > make sense to have direct dispatch, the other can be expensive > so i wonder if it wouldn't be better to use deferred dispatch. > If not, perhaps you might try to reduce the rx_processing_limit > to bring down the load on the intr thread. I don't really see any issue with horsepower on this host at the moment with 4 queues. I mean it looks a little something like this under high load: http://people.freebsd.org/~sbruno/igb_top.txt I guess my question still stands though, since the ethernet controller is reporting that it doesn't have any more descriptors available is the hardcoded 4k max descriptors a limit that an be raised? > With your numbers i doubt that raising the queue size helps. Indeed, you're probably right and this is more than likely an application problem that will have to be resolved. However, I'm still curious if the MAX_RXD/TXD is really 4k or if the documentation is correct and we can raise it to 32k for testing? Sean --=-fZwHBsYnQWqPnYlo+ahK Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iQEcBAABAgAGBQJPjutmAAoJEL2UHwafTLtOqVAH/Re6EjwbReJdEXq2O9ZYM3LE kgWQH4XIYfG+UvlIYRjaOxL/8vydd8iFYQIdCUh2AZseZ19uJJveUSD0bp42V/9H mFR7Z6OOYUxWwI15WjoVh2I/q2J+qP3dX0VGHt5kvJLUcBm+Ys9asqHu54RBZeNB UtacUALDgKOO4nasTBZCyEYOl2R3czmN9BVpI8TNuhbekksXcJIJxERsfHJ71KD4 EnUi24/TVJAjUcp4lU/ECEhmXJmB0XRJ45AigyiM1Iii+N09WLss+yfQLuq2c8bA QMA6SKEORK2ymyua7tN/n/BsVOdpnYscyItzXB6DvvH3MWrVRMij6bqpf7Bhop4= =G5jY -----END PGP SIGNATURE----- --=-fZwHBsYnQWqPnYlo+ahK-- From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 16:46:49 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB930106566C; Wed, 18 Apr 2012 16:46:49 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3698B8FC20; Wed, 18 Apr 2012 16:46:49 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so773690wgb.1 for ; Wed, 18 Apr 2012 09:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=NSJeRcagKOd+eJX+EzmjISmnEvVbyoM+IwkMkMrml94=; b=pwEB/I88++fD8gQaGXLsvCJAyHyaXxUvsj9mOIyQPG2WGadMTGL9PXyne2Vw2YDnKh W5qt6pS1WQgquWZ6iOFUnuypyeufRDBPEc219ujPJbb8cNFcPjcpSZsqq8nMSdqZsJba KNhne8yvxujsNy24t5yutpKVEQCt72J6MUCn/Hzx7MetHJMgaZ0pPP51xVeciGksUGww xccAlqHQE+aCgcLrckB+nhAS5AfScHhakmo+gRq8q4C8KJgvTQASFKZJbUxdxZVQFFJP aXf36PE1o1a0lrED5NTsf0UWamgQt+OlVu4wHer7o96gwbsdb4S7u+E4W6BC8HawgV8C COHQ== MIME-Version: 1.0 Received: by 10.216.134.155 with SMTP id s27mr2101770wei.80.1334767607936; Wed, 18 Apr 2012 09:46:47 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Wed, 18 Apr 2012 09:46:47 -0700 (PDT) In-Reply-To: <1334766438.3466.4.camel@powernoodle-l7.corp.yahoo.com> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <20120418072818.GA58850@onelab2.iet.unipi.it> <1334766438.3466.4.camel@powernoodle-l7.corp.yahoo.com> Date: Wed, 18 Apr 2012 09:46:47 -0700 Message-ID: From: Jack Vogel To: "sbruno@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 16:46:49 -0000 The MAX value is something I set, not a hardware thing, it was based on reports I had from the various driver engineers in our org. If you increase the ring size you might run into other performance issues, however there's nothing stopping you from trying. Just be aware that its not something that's been tested. Let me know how it goes please :) Jack On Wed, Apr 18, 2012 at 9:27 AM, Sean Bruno wrote: > On Wed, 2012-04-18 at 00:28 -0700, Luigi Rizzo wrote: > > On Tue, Apr 17, 2012 at 04:24:24PM -0700, Sean Bruno wrote: > > > We're running a service with a 82576 configured with 4 queues and a > > > maxed rxd/txd configuration: > > > > > > http://people.freebsd.org/~sbruno/igb_stats.txt > > > > these stats show that over half of your incoming traffic is > > made of small packets (65..127 bytes) but especially, that > > the "missed packets" count is very small (18k out of 40G packets) > > none of them is reported as "no_desc_avail", and only 76 are > > "recv_no_buffer". > > > > Are you dropping packets in the ip interrupt handler by chance ? > > what are your settings there ? > > > nope, doesn't look like it. > > http://people.freebsd.org/~sbruno/igb_ip_stats.txt > > > BTW it seems that there is only one global setting for the dispatch > > policy, but for instance there are two netisr_dispatch() calls > > in the incoming path, one for layer2 and one for layer3. > > The former has relatively little work to do and so it might > > make sense to have direct dispatch, the other can be expensive > > so i wonder if it wouldn't be better to use deferred dispatch. > > If not, perhaps you might try to reduce the rx_processing_limit > > to bring down the load on the intr thread. > > I don't really see any issue with horsepower on this host at the moment > with 4 queues. I mean it looks a little something like this under high > load: > > http://people.freebsd.org/~sbruno/igb_top.txt > > I guess my question still stands though, since the ethernet controller > is reporting that it doesn't have any more descriptors available is the > hardcoded 4k max descriptors a limit that an be raised? > > > With your numbers i doubt that raising the queue size helps. > > Indeed, you're probably right and this is more than likely an > application problem that will have to be resolved. However, I'm still > curious if the MAX_RXD/TXD is really 4k or if the documentation is > correct and we can raise it to 32k for testing? > > Sean > From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 16:49:53 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EAA5106566B; Wed, 18 Apr 2012 16:49:53 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1.freebsd.org (Postfix) with ESMTP id DFEA18FC16; Wed, 18 Apr 2012 16:49:52 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3IGn6J0035093; Wed, 18 Apr 2012 09:49:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334767747; bh=VdEEdvwR02rvWhH/wBm5l4JFDoHAjS/m9ePYMzFlCOM=; h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date: Message-ID:Mime-Version:Content-Transfer-Encoding; b=epFwLr9t37OoTRXOwBTVHRlhcpYvImfNQEoy0M66PnrnoyZLMId0wo/OHWT1KrAhl iVrW2KRB1yMi/a7Gdxd+oC9ECcwPDdGhs9m1+LqGabu/3LP2GSIcSsJ+ICSUnoFc/l ZVYFDlcwbA6om/1AXAuNaoSWER7DFhkDK2tRJahQ= From: Sean Bruno To: Jack Vogel In-Reply-To: References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <20120418072818.GA58850@onelab2.iet.unipi.it> <1334766438.3466.4.camel@powernoodle-l7.corp.yahoo.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 18 Apr 2012 09:49:06 -0700 Message-ID: <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 767746000 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 16:49:53 -0000 ok, good. that at least confirms that I correctly translated between the driver code and documented specification. I will try 8k as a test for now and see how that runs. sean On Wed, 2012-04-18 at 09:46 -0700, Jack Vogel wrote: > The MAX value is something I set, not a hardware thing, it was based > on reports > I had from the various driver engineers in our org. If you increase > the ring size > you might run into other performance issues, however there's nothing > stopping > you from trying. Just be aware that its not something that's been > tested. > > Let me know how it goes please :) > > Jack > > > On Wed, Apr 18, 2012 at 9:27 AM, Sean Bruno > wrote: > On Wed, 2012-04-18 at 00:28 -0700, Luigi Rizzo wrote: > > On Tue, Apr 17, 2012 at 04:24:24PM -0700, Sean Bruno wrote: > > > We're running a service with a 82576 configured with 4 > queues and a > > > maxed rxd/txd configuration: > > > > > > http://people.freebsd.org/~sbruno/igb_stats.txt > > > > these stats show that over half of your incoming traffic is > > made of small packets (65..127 bytes) but especially, that > > the "missed packets" count is very small (18k out of 40G > packets) > > none of them is reported as "no_desc_avail", and only 76 are > > "recv_no_buffer". > > > > Are you dropping packets in the ip interrupt handler by > chance ? > > what are your settings there ? > > > > nope, doesn't look like it. > > http://people.freebsd.org/~sbruno/igb_ip_stats.txt > > > BTW it seems that there is only one global setting for the > dispatch > > policy, but for instance there are two netisr_dispatch() > calls > > in the incoming path, one for layer2 and one for layer3. > > The former has relatively little work to do and so it might > > make sense to have direct dispatch, the other can be > expensive > > so i wonder if it wouldn't be better to use deferred > dispatch. > > If not, perhaps you might try to reduce the > rx_processing_limit > > to bring down the load on the intr thread. > > > I don't really see any issue with horsepower on this host at > the moment > with 4 queues. I mean it looks a little something like this > under high > load: > > http://people.freebsd.org/~sbruno/igb_top.txt > > I guess my question still stands though, since the ethernet > controller > is reporting that it doesn't have any more descriptors > available is the > hardcoded 4k max descriptors a limit that an be raised? > > > With your numbers i doubt that raising the queue size helps. > > > Indeed, you're probably right and this is more than likely an > application problem that will have to be resolved. However, > I'm still > curious if the MAX_RXD/TXD is really 4k or if the > documentation is > correct and we can raise it to 32k for testing? > > Sean > From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 18:15:14 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DD5E4106566C for ; Wed, 18 Apr 2012 18:15:13 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 9A1A58FC0C for ; Wed, 18 Apr 2012 18:15:13 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SKZPD-00044h-Fe for freebsd-net@freebsd.org; Wed, 18 Apr 2012 20:15:03 +0200 Received: from www01.lwilke.de ([78.47.159.91]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Apr 2012 20:15:03 +0200 Received: from lw by www01.lwilke.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Apr 2012 20:15:03 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Lars Wilke Date: Wed, 18 Apr 2012 14:01:48 +0000 Lines: 90 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: www01.lwilke.de User-Agent: slrn/0.9.9p1 (Linux) Subject: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 18:15:14 -0000 Hi, i first posted the following to the -stable list but got no reply. Maybe someone here has some advice for me. Switch: HP ProCurve 2910al The switch does passive LACP Motherboard: Supermicro X8DTN+-F NIC: Quad Port Card, i.e. em1: em1@pci0:6:0:1: class=0x020000 card=0x125e15d9 chip=0x105e8086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = 'HP NC360T PCIe DP Gigabit Server Adapter (n1e5132)' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfb9e0000, size 131072, enabled bar [14] = type Memory, range 32, base 0xfb9c0000, size 131072, enabled bar [18] = type I/O Port, range 32, base 0xcc00, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4) ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected ecap 0003[140] = Serial 1 002590ffff0484d8 I use CAT 6 cables and the switch and server are in the same cabinet. OS: FBSD is 8.2-Release rc.conf: ifconfig_em0="up" ifconfig_em1="up" ifconfig_em2="up" ifconfig_em3="up" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 laggport em2 laggport em3" ipv4_addrs_lagg0="192.168.80.20/24" Hm, what sysctls might be interesting? I use: net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=131072 kern.ipc.nmbclusters=230400 kern.maxvnodes=250000 kern.maxfiles=65536 kern.maxfilesperproc=32768 vfs.read_max=32 loader.conf: does only contain stuff concerning zfs Except for swap the whole system uses zfs, swap is on a geom mirror. Once in a while i see this messages in /var/log/messages Apr 13 08:53:07 san02 kernel: em1: Watchdog timeout -- resetting Apr 13 08:53:07 san02 kernel: em1: Queue(0) tdh = 232, hw tdt = 190 Apr 13 08:53:07 san02 kernel: em1: TX(0) desc avail = 31,Next TX to Clean = 221 Apr 13 08:53:07 san02 kernel: em1: Link is Down Apr 13 08:53:07 san02 kernel: em1: link state changed to DOWN Sometimes nothing for days, sometimes under high Network load (NFSv3), sometimes multiple times a day. I see this message/behaviour on always the same two of the four interfaces (em1 and em3). Then the NIC does not have the ACTIVE flag anymore, an ifconfig em1 up solves the issue. But why does it loose the ACTIVE state and why does the NIC reset itself in the first place? On the switch i see that the port matching em1 on the server has left the trunk, so the missing ACTIVE flag is not lying 8-/ Googling found many postings with the same problem and one site suggested that this might be an ACPI problem but nothing concrete and the postings i found were mostly FBSD7 and older. Any pointers would be appreciated. Thank you --lars _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 19:12:44 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 22AE61065743 for ; Wed, 18 Apr 2012 19:12:44 +0000 (UTC) (envelope-from dmk.sbor@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id CD3BB8FC17 for ; Wed, 18 Apr 2012 19:12:43 +0000 (UTC) Received: by yenl9 with SMTP id l9so4727505yen.13 for ; Wed, 18 Apr 2012 12:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7SYZLAafb/+XMEmOV69wYB5xziQNkox/w6SxQNpmTu0=; b=ivMmyesQW4+I4/mwjd4Hx9hcUvveQT30kclhdhxw1IK6+zmy9nMxy+fh2wjP09XLi4 kMAohRIohIwpcKVtJ6YYvX8OGu+FqC3HRVQlwTtL9Yuiv+21ZHLdVcOLyvLWu1ypESV1 EPOYPV8VHzV1K8mBhJ44xlaQ75vk6A4+mRDbNMbt31x6ZUXCbkWIwiz30gCd/id3caIL xexJO9HtWd/PwZCh0F6o9JVuZXHUoXqsNQFyrurAvji9h+ZDD59uBWrji9J1y4WDmnpz a6RMBoEVN/XeOF0lP6PNfsnzlRX1oKbMdA264kfFVJL3lFQkxKCNe5sK3qRJniknhSpk Qm8w== MIME-Version: 1.0 Received: by 10.236.193.39 with SMTP id j27mr3483535yhn.111.1334776363111; Wed, 18 Apr 2012 12:12:43 -0700 (PDT) Received: by 10.146.168.1 with HTTP; Wed, 18 Apr 2012 12:12:42 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Apr 2012 23:12:42 +0400 Message-ID: From: "Dmitry S. Kasterin" To: Kevin Oberman Content-Type: text/plain; charset=UTF-8 Cc: freebsd-net@freebsd.org, Michael Sierchio Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 19:12:44 -0000 Kevin, Michael, hi > a real problem with IPFW. Well, someone who can confirm or disprove my guesswork is much desirable ) > But I do have to ask why you find statefull rules for outgoing TCP > connections desirable? Why not: > 00101 allow tcp from me to any established > It appears to do the same thing for TCP and is much faster to process > plus it does not leave you open to trivial DOS (often of yourself) by > filling the dynamic rule tables. The host in question is my workstation running FreeBSD. I have no reason to distrust its users. Workstation doesn't host services. So I've decided to keep ruleset short and clean: 00001 allow ip from any to any via lo0 00002 deny ip from any to 127.0.0.0/8 00003 deny ip from 127.0.0.0/8 to any 00004 deny ip6 from any to any 00010 check-state 00101 allow tcp from me to any out setup keep-state 00201 allow udp from me to any out keep-state 00301 allow icmp from me to any out keep-state 00302 allow icmp from any to me in icmptypes 3,4,8,11,12 65534 deny ip from any to any 65535 deny ip from any to any Yes, I'm aware of possible DOS. But I have direct access to the workstation; if something goes wrong, I always can examine it. Thank you for the "allow tcp from me to any established" rule, I'll give it a try later. > Generally, for client systems, stateful UDP makes sense, but I > generally don't understand why people choose the more complex, slower, > and potentially disruptive stateful rules for TCP. Hmm, http://undeadly.org/cgi?action=article&sid=20060927091645 says: "For specific connections like DNS lookups, where each connection only consists of two packets (one request and one reply), the overhead of state creation might be worse than two ruleset evaluations. Connections that consist of more than a handful of packets, like most TCP connections, will benefit from the created state entry." But it doesn't matter - both stateless and stateful rules for UPD will work in my case. > Look at the kernel tunables: > ... # sysctl net.inet.ip.fw | grep _lifetime net.inet.ip.fw.dyn_short_lifetime: 5 net.inet.ip.fw.dyn_udp_lifetime: 10 net.inet.ip.fw.dyn_rst_lifetime: 1 net.inet.ip.fw.dyn_fin_lifetime: 1 net.inet.ip.fw.dyn_syn_lifetime: 20 net.inet.ip.fw.dyn_ack_lifetime: 300 I didn't change anything. Quite possible dyn_fin_lifetime is too small. I'll try to raise it. From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 20:17:33 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8BD6C106564A for ; Wed, 18 Apr 2012 20:17:33 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1250B8FC08 for ; Wed, 18 Apr 2012 20:17:32 +0000 (UTC) Received: by wern13 with SMTP id n13so6549993wer.13 for ; Wed, 18 Apr 2012 13:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=hiCBOsEi2i52MkKKBYN3KJ7Q5n9aGgODhBfuu3nhqEg=; b=ufAMw9Gqf3fQYH5vKzTqOkWCwJtIqhcICURylPe10IqbbS7ndlLTvdbvd1bsTuZHv8 qqnJcqD6axZy/fAACTMY5fhijdkeBLbZMA5qY6RT7SImgpBFRu4QDm/GpdWdElyeKpZQ k1PEipdB1nyZsef9IMRWIfcw+9GkCvarrM28Q9tHGTk8u0mH/SP8cQIlsP3p59wiNyRZ /n8+DLslT0UX2W3mIkMpobVuGUfVnUwblCbnosgLTgUsPa2M/M4xJ8jdjiXJx7inBt1c iDoISnA2Ur3mxmtOOlk+s+EJyGDOmvne/6l/y+d9bNcWkUMlSzUSjDQdBIQoCUem6lPH Y2QA== MIME-Version: 1.0 Received: by 10.180.107.104 with SMTP id hb8mr9309108wib.8.1334780251951; Wed, 18 Apr 2012 13:17:31 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Wed, 18 Apr 2012 13:17:31 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Apr 2012 13:17:31 -0700 Message-ID: From: Jack Vogel To: Lars Wilke Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 20:17:33 -0000 On Wed, Apr 18, 2012 at 7:01 AM, Lars Wilke wrote: > Hi, > > i first posted the following to the -stable list but got no > reply. Maybe someone here has some advice for me. > > > Switch: HP ProCurve 2910al > The switch does passive LACP > > Motherboard: Supermicro X8DTN+-F > > NIC: Quad Port Card, i.e. em1: > em1@pci0:6:0:1: class=0x020000 card=0x125e15d9 chip=0x105e8086 > rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = 'HP NC360T PCIe DP Gigabit Server Adapter (n1e5132)' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xfb9e0000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xfb9c0000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0xcc00, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 002590ffff0484d8 > > I use CAT 6 cables and the switch and server are in the same cabinet. > > OS: FBSD is 8.2-Release > > rc.conf: > ifconfig_em0="up" > ifconfig_em1="up" > ifconfig_em2="up" > ifconfig_em3="up" > cloned_interfaces="lagg0" > ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 laggport em2 > laggport em3" > ipv4_addrs_lagg0="192.168.80.20/24" > > > Hm, what sysctls might be interesting? > I use: > net.inet.tcp.sendbuf_max=16777216 > net.inet.tcp.recvbuf_max=16777216 > net.inet.tcp.sendspace=65536 > net.inet.tcp.recvspace=131072 > kern.ipc.nmbclusters=230400 > kern.maxvnodes=250000 > kern.maxfiles=65536 > kern.maxfilesperproc=32768 > vfs.read_max=32 > > loader.conf: does only contain stuff concerning zfs > > Except for swap the whole system uses zfs, swap is on a geom mirror. > > Once in a while i see this messages in /var/log/messages > > Apr 13 08:53:07 san02 kernel: em1: Watchdog timeout -- resetting > Apr 13 08:53:07 san02 kernel: em1: Queue(0) tdh = 232, hw tdt = 190 > Apr 13 08:53:07 san02 kernel: em1: TX(0) desc avail = 31,Next TX to > Clean = 221 > Apr 13 08:53:07 san02 kernel: em1: Link is Down > Apr 13 08:53:07 san02 kernel: em1: link state changed to DOWN > > Sometimes nothing for days, sometimes under high Network load (NFSv3), > sometimes > multiple times a day. I see this message/behaviour on always the same two > of the > four interfaces (em1 and em3). > > Then the NIC does not have the ACTIVE flag anymore, an ifconfig em1 up > solves the issue. But why does it loose the ACTIVE state and why does the > NIC reset itself in the first place? > Because a watchdog reset is just that, a reset, so it causes the hardware to reinitialize. It should come back up, I do not know why it did not, maybe the renegotiation with the switch fails for some reason? One thought is to get the latest em driver and see if the behavior changes, if that driver is the distributed 8.2 its pretty old. > On the switch i see that the port matching em1 on the server has left > the trunk, so the missing ACTIVE flag is not lying 8-/ > > Googling found many postings with the same problem and one site suggested > that this might be an ACPI problem but nothing concrete and the postings > i found were mostly FBSD7 and older. > > Any pointers would be appreciated. > Thank you > > --lars > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 20:45:58 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84332106566C for ; Wed, 18 Apr 2012 20:45:58 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 3F6D08FC0A for ; Wed, 18 Apr 2012 20:45:58 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SKblB-0001zN-E2 for freebsd-net@freebsd.org; Wed, 18 Apr 2012 22:45:53 +0200 Received: from www01.lwilke.de ([78.47.159.91]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Apr 2012 22:45:53 +0200 Received: from lw by www01.lwilke.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 18 Apr 2012 22:45:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Lars Wilke Date: Wed, 18 Apr 2012 20:45:28 +0000 Lines: 36 Message-ID: <8ol369-4nf.ln1@lwilke.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: www01.lwilke.de User-Agent: slrn/0.9.9p1 (Linux) Subject: Re: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 20:45:58 -0000 Hi Jack, thanks for your response. * Jack Vogel wrote: > On Wed, Apr 18, 2012 at 7:01 AM, Lars Wilke wrote: > > Apr 13 08:53:07 san02 kernel: em1: Watchdog timeout -- resetting > > Apr 13 08:53:07 san02 kernel: em1: Queue(0) tdh = 232, hw tdt = 190 > > Apr 13 08:53:07 san02 kernel: em1: TX(0) desc avail = 31,Next TX to > > Clean = 221 > > Apr 13 08:53:07 san02 kernel: em1: Link is Down > > Apr 13 08:53:07 san02 kernel: em1: link state changed to DOWN > > > > Sometimes nothing for days, sometimes under high Network load (NFSv3), > > sometimes > > multiple times a day. I see this message/behaviour on always the same two > > of the > > four interfaces (em1 and em3). > > > > Then the NIC does not have the ACTIVE flag anymore, an ifconfig em1 up > > solves the issue. But why does it loose the ACTIVE state and why does the > > NIC reset itself in the first place? > > > > Because a watchdog reset is just that, a reset, so it causes the hardware to > reinitialize. It should come back up, I do not know why it did not, maybe > the renegotiation with the switch fails for some reason? Hm, my main problem is that it did a reset in the first place. > One thought is to get the latest em driver and see if the behavior changes, > if that driver is the distributed 8.2 its pretty old. ok then i guess i will upgrade to 8.3-R, is the driver there reasonably new? --lars From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 21:56:57 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7E0E106564A for ; Wed, 18 Apr 2012 21:56:57 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 606618FC14 for ; Wed, 18 Apr 2012 21:56:57 +0000 (UTC) Received: by wern13 with SMTP id n13so6611471wer.13 for ; Wed, 18 Apr 2012 14:56:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+0hcAoxrg0rdOECnnjHYCDjdMveZ7gZ4HVdg4LKW0ZI=; b=WPElrhwgsuJPKeJ0PnRiB0RJ5vu0kqoaJ8ipNciNYlasUrn+jfT6LBm6CtlexXFtwq Y2meqkQ942vxxPTE4hUEj/8IuC6Wdu1OAQX8cVPBdjV8oEInZ8RepIfC1nAl+siTCTcR O1Q0HY+JFKeHQNzy1tliBjf7Vhc/2+OkuNTcmlsXkSu77v0Fg8mQ2Fy9qRwplpxe/PeX 2bls5FZwU1P0DsRckAFDl9G6ozuUL6Rm7OWF0A6VvJLHNWbenC5efK0D7udepHMR8kBt V9kD0QsbSxyMfQHH8vLlZN9Sb+3lrHlynpAbwXC39QbWwY56XH7hqGncwcLcsFvLfri0 O+kA== MIME-Version: 1.0 Received: by 10.216.134.155 with SMTP id s27mr2705253wei.80.1334786215519; Wed, 18 Apr 2012 14:56:55 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Wed, 18 Apr 2012 14:56:55 -0700 (PDT) In-Reply-To: <8ol369-4nf.ln1@lwilke.de> References: <8ol369-4nf.ln1@lwilke.de> Date: Wed, 18 Apr 2012 14:56:55 -0700 Message-ID: From: Jack Vogel To: Lars Wilke Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 21:56:57 -0000 On Wed, Apr 18, 2012 at 1:45 PM, Lars Wilke wrote: > Hi Jack, > > thanks for your response. > > * Jack Vogel wrote: > > On Wed, Apr 18, 2012 at 7:01 AM, Lars Wilke wrote: > > > Apr 13 08:53:07 san02 kernel: em1: Watchdog timeout -- resetting > > > Apr 13 08:53:07 san02 kernel: em1: Queue(0) tdh = 232, hw tdt = 190 > > > Apr 13 08:53:07 san02 kernel: em1: TX(0) desc avail = 31,Next TX to > > > Clean = 221 > > > Apr 13 08:53:07 san02 kernel: em1: Link is Down > > > Apr 13 08:53:07 san02 kernel: em1: link state changed to DOWN > > > > > > Sometimes nothing for days, sometimes under high Network load (NFSv3), > > > sometimes > > > multiple times a day. I see this message/behaviour on always the same > two > > > of the > > > four interfaces (em1 and em3). > > > > > > Then the NIC does not have the ACTIVE flag anymore, an ifconfig em1 up > > > solves the issue. But why does it loose the ACTIVE state and why does > the > > > NIC reset itself in the first place? > > > > > > > Because a watchdog reset is just that, a reset, so it causes the > hardware to > > reinitialize. It should come back up, I do not know why it did not, > maybe > > the renegotiation with the switch fails for some reason? > > Hm, my main problem is that it did a reset in the first place. > > > One thought is to get the latest em driver and see if the behavior > changes, > > if that driver is the distributed 8.2 its pretty old. > > ok then i guess i will upgrade to 8.3-R, is the driver there reasonably > new? > > Yes, that should be fine. Jack From owner-freebsd-net@FreeBSD.ORG Wed Apr 18 23:40:45 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 69B62106566C; Wed, 18 Apr 2012 23:40:45 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1.freebsd.org (Postfix) with ESMTP id 1D8CD8FC1B; Wed, 18 Apr 2012 23:40:45 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3INeHEe011376; Wed, 18 Apr 2012 16:40:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334792419; bh=jCiBGwJmVwTo0B3fnU1SDLZ/yJpW6J9YrN/ii158Z/s=; h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date: Message-ID:Mime-Version:Content-Transfer-Encoding; b=E9LbhP6hBL1nbuLZQ2TxxEh7AaC2S2M3V8t0p/T/tf9PU6Bw4LA8heSg8MvUdsgbg YbFMrKVed7QAvoO5pyB20NnhKh8EeYj9/5kXZkvNz6QLX8Q7To2KeNK5sH4tqF4VXw Wq99zKpfr/k6riQ1h6RUnPfICMJuI0dAG5W9do3c= From: Sean Bruno To: Jack Vogel In-Reply-To: <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <20120418072818.GA58850@onelab2.iet.unipi.it> <1334766438.3466.4.camel@powernoodle-l7.corp.yahoo.com> <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 18 Apr 2012 16:40:17 -0700 Message-ID: <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 792417002 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2012 23:40:45 -0000 On Wed, 2012-04-18 at 09:49 -0700, Sean Bruno wrote: > ok, good. that at least confirms that I correctly translated between > the driver code and documented specification. > > I will try 8k as a test for now and see how that runs. > > sean For now, I've patched one front end server with: /usr/src/sys/dev/e1000/if_igb.h:#define IGB_MAX_RXD 4096 * 4 And adjusted hw.igb.rxd: 8192 So far so good, been running in production for a couple of hours so the "smoke test" for this setting seems to be happy. We'll continue to adjust and test tomorrow during higher load conditions. Sean From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 03:30:46 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D7E2F106566B; Thu, 19 Apr 2012 03:30:46 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-pz0-f44.google.com (mail-pz0-f44.google.com [209.85.210.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9EE158FC08; Thu, 19 Apr 2012 03:30:46 +0000 (UTC) Received: by dadz14 with SMTP id z14so35006904dad.17 for ; Wed, 18 Apr 2012 20:30:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=VfsYh5xQ3QZRV0tSJBQCF8T4wfspDUHvUAEE2wKvW2E=; b=waycEs/RGVrK4Oj54Vms/JyB+34kNlnYAjJ05EtjLjZGGpUOfMJZAoU5IRCpEngsok Iqq4IplfWXnqboFgjjjR+Sq0g5wQTMTHbPugVn6L25PjbrN3InEYWBbSzGDTIKc+JcXE U72m1kic3Z6LE0NndOuoKkxXx9vGeq+HDcGYXUUcxUBGrdhjvGcXYxFX44TVtugSmV+t HqZknKsj2MaQGxsta0S+NVC+bfUvPwWEHN5W8mge3x9mVHLduIBgu8JjR7AZ/84vzPm6 BSiMJOkjLx8lL6ajpwh2n73eRIB2ngLUI5Nb2cmPINnpeeh/hlGjRoJa0xUoKhfe46ys IUxA== MIME-Version: 1.0 Received: by 10.68.134.133 with SMTP id pk5mr1608043pbb.17.1334806246219; Wed, 18 Apr 2012 20:30:46 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.142.101.9 with HTTP; Wed, 18 Apr 2012 20:30:46 -0700 (PDT) In-Reply-To: References: <201204060653.q366rwLa096182@svn.freebsd.org> <4F7E9413.20602@FreeBSD.org> <4F8BBD4E.1040106@FreeBSD.org> Date: Wed, 18 Apr 2012 20:30:46 -0700 X-Google-Sender-Auth: cGs2XcjUmg6t-r4yp0xREUBavUQ Message-ID: From: Adrian Chadd To: "Alexander V. Chernikov" Content-Type: multipart/mixed; boundary=047d7b11190b26616304bdffcc68 Cc: freebsd-net@freebsd.org Subject: Re: svn commit: r233937 - in head/sys: kern net security/mac X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 03:30:46 -0000 --047d7b11190b26616304bdffcc68 Content-Type: text/plain; charset=ISO-8859-1 Hi, Here's what I have thus far. Adrian --047d7b11190b26616304bdffcc68 Content-Type: application/octet-stream; name="bpf.diff" Content-Disposition: attachment; filename="bpf.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h17986zw0 SW5kZXg6IHN5cy9uZXQvYnBmLmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL25ldC9icGYuYwkocmV2aXNp b24gMjM0MzY5KQorKysgc3lzL25ldC9icGYuYwkod29ya2luZyBjb3B5KQpAQCAtMTc0NywxMiAr MTc0NywxMyBAQAogCQlwYW5pYygiYnBmX3NldGlmOiBidWZtb2RlICVkIiwgZC0+YmRfYnVmbW9k ZSk7CiAJfQogCWlmIChicCAhPSBkLT5iZF9iaWYpIHsKKwkJQlBGX0xPQ0soKTsKIAkJaWYgKGQt PmJkX2JpZikKIAkJCS8qCiAJCQkgKiBEZXRhY2ggaWYgYXR0YWNoZWQgdG8gc29tZXRoaW5nIGVs c2UuCiAJCQkgKi8KIAkJCWJwZl9kZXRhY2hkKGQpOwotCisJCUJQRl9VTkxPQ0soKTsKIAkJYnBm X2F0dGFjaGQoZCwgYnApOwogCX0KIAlCUEZEX1dMT0NLKGQpOwpAQCAtMjM5Myw3ICsyMzk0LDkg QEAKIAkJCW5kZXRhY2hlZCsrOwogI2VuZGlmCiAJCQl3aGlsZSAoKGQgPSBMSVNUX0ZJUlNUKCZi cC0+YmlmX2RsaXN0KSkgIT0gTlVMTCkgeworCQkJCUJQRl9MT0NLKCk7CiAJCQkJYnBmX2RldGFj aGQoZCk7CisJCQkJQlBGX1VOTE9DSygpOwogCQkJCUJQRkRfV0xPQ0soZCk7CiAJCQkJYnBmX3dh a2V1cChkKTsKIAkJCQlCUEZEX1dVTkxPQ0soZCk7CkBAIC0yNDYyLDcgKzI0NjUsOSBAQAogCUJQ Rl9VTkxPQ0soKTsKIAlpZiAoYnAgIT0gTlVMTCkgewogCQlvcHJvbWlzYyA9IGQtPmJkX3Byb21p c2M7CisJCUJQRl9MT0NLKCk7CiAJCWJwZl9kZXRhY2hkKGQpOworCQlCUEZfVU5MT0NLKCk7CiAJ CWJwZl9hdHRhY2hkKGQsIGJwKTsKIAkJQlBGRF9XTE9DSyhkKTsKIAkJcmVzZXRfZChkKTsK --047d7b11190b26616304bdffcc68-- From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 05:47:50 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 105BC1065670 for ; Thu, 19 Apr 2012 05:47:50 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id B19628FC0C for ; Thu, 19 Apr 2012 05:47:49 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 98611CBD98E; Thu, 19 Apr 2012 07:41:38 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 11.0999] X-CRM114-CacheID: sfid-20120419_07413_EBD187D1 X-CRM114-Status: Good ( pR: 11.0999 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Thu Apr 19 07:41:38 2012 X-DSPAM-Confidence: 0.6874 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4f8fa592633041250412649 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, User-Agent*i686, 0.00894, User-Agent*Linux+i686, 0.00998, Date*19+Apr, 0.99000, Date*41+37, 0.99000, Date*07+41, 0.99000, User-Agent*i686+en, 0.01301, User-Agent*Mozilla/5.0+(X11, 0.01377, User-Agent*Linux, 0.01986, User-Agent*U+Linux, 0.02451, User-Agent*rv+1.8.1.23), 0.02776, User-Agent*1.8.1.23), 0.02776, User-Agent*Thunderbird/2.0.0.23, 0.02776, From*Nagy, 0.02776, Received*for+; Thu, 19 Apr 2012 07:41:37 +0200 (CEST) Message-ID: <4F8FA591.4010503@fsn.hu> Date: Thu, 19 Apr 2012 07:41:37 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 To: freebsd-net@freebsd.org Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: SO_BINDTODEVICE or equivalent? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 05:47:50 -0000 Hi, I want to solve the classic problem of a DHCP server: listening for broadcast UDP packets and figuring out what interface a packet has come in. The Linux solution is SO_BINDTODEVICE, which according to socket(7): SO_BINDTODEVICE Bind this socket to a particular device like "eth0", as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this only works for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal [1]bind(2) there). This makes it possible to listen on selected interfaces for (broadcast) packets. FreeBSD currently doesn't implement this feature. Any chances that somebody will do this? What alternatives would you recommend? Raw packet access (like BPF and RAW sockets) finally make the application to do more -mainly useless- work. Are there any other solutions, which doesn't require additional packet parsing? Thanks, References 1. http://linux.die.net/man/2/bind From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 09:32:20 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E27FE106564A for ; Thu, 19 Apr 2012 09:32:20 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6665A8FC1C for ; Thu, 19 Apr 2012 09:32:20 +0000 (UTC) Received: by lbbgm6 with SMTP id gm6so2491972lbb.13 for ; Thu, 19 Apr 2012 02:32:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=rGNMdTCYPcWaFhcA5mjQssD771sF3CosDxEt/eGOZjA=; b=e++brnlMw93JZdOO6Pubgfq2kb8GbK8kUn37A7ttGMHNRcxoQJXAt3f8IHk9pLxzod riWW1MJ6QSLF74oSqZE0fOTNkCV/t1qSzkvDX5/bDRFrgfktVYne+wjksXOtZwQjz0AW OUzHtYAzkna6wsl+cV2mnCnwboqN77cubZoMdgijtoalJ5Uv3hSjiOuPOACHbomSvP4R DGDDI/eVmAGtJ5pxrZ9zbhEx7SmS7djb1AExmPCP3TSqUCQYZaZVa7FO1GVxmLinlBTq CA0H2Gkx4PnJw3jhFeJaWTFTZjFTN9MCa1ZnS1rxeDbDSO81LVIJOq7/KF6V8hXBgidG +wUQ== MIME-Version: 1.0 Received: by 10.112.26.10 with SMTP id h10mr636926lbg.79.1334827938929; Thu, 19 Apr 2012 02:32:18 -0700 (PDT) Received: by 10.112.56.179 with HTTP; Thu, 19 Apr 2012 02:32:18 -0700 (PDT) In-Reply-To: <4F8FA591.4010503@fsn.hu> References: <4F8FA591.4010503@fsn.hu> Date: Thu, 19 Apr 2012 11:32:18 +0200 Message-ID: From: Svatopluk Kraus To: Attila Nagy Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org Subject: Re: SO_BINDTODEVICE or equivalent? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 09:32:21 -0000 Hi, Use IP_RECVIF option. For IP_SENDIF look at http://lists.freebsd.org/pipermail/freebsd-net/2007-March/013510.html I used the patch on my embedded FreeBSD 9.0 boxes and it works fine. I modificated it slightly to match 9.0. Svata On Thu, Apr 19, 2012 at 7:41 AM, Attila Nagy wrote: > > =A0 Hi, > =A0 I want to solve the classic problem of a DHCP server: listening for > =A0 broadcast UDP packets and figuring out what interface a packet has > =A0 come in. > =A0 The Linux solution is SO_BINDTODEVICE, which according to socket(7): > =A0 SO_BINDTODEVICE > =A0 =A0 =A0 =A0 =A0Bind this socket to a particular device like "eth0", a= s > =A0 =A0 =A0 =A0 =A0specified in the passed interface name. If the name is= an empty > =A0 =A0 =A0 =A0 =A0string or the option length is zero, the socket device= binding > =A0 =A0 =A0 =A0 =A0is removed. The passed option is a variable-length > =A0 =A0 =A0 =A0 =A0null-terminated interface name string with the maximum= size of > =A0 =A0 =A0 =A0 =A0IFNAMSIZ. If a socket is bound to an interface, only p= ackets > =A0 =A0 =A0 =A0 =A0received from that particular interface are processed = by the > =A0 =A0 =A0 =A0 =A0socket. Note that this only works for some socket type= s, > =A0 =A0 =A0 =A0 =A0particularly AF_INET sockets. It is not supported for = packet > =A0 =A0 =A0 =A0 =A0sockets (use normal [1]bind(2) there). > > =A0 This makes it possible to listen on selected interfaces for > =A0 (broadcast) packets. FreeBSD currently doesn't implement this feature= . > =A0 Any chances that somebody will do this? > =A0 What alternatives would you recommend? Raw packet access (like BPF an= d > =A0 RAW sockets) finally make the application to do more -mainly useless- > =A0 work. > =A0 Are there any other solutions, which doesn't require additional packe= t > =A0 parsing? > =A0 Thanks, > > References > > =A0 1. http://linux.die.net/man/2/bind > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 11:44:21 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3B7CC106564A for ; Thu, 19 Apr 2012 11:44:21 +0000 (UTC) (envelope-from lw@lwilke.de) Received: from mx02.it-betrieb.de (mx02.it-betrieb.de [78.47.77.217]) by mx1.freebsd.org (Postfix) with ESMTP id EDACE8FC08 for ; Thu, 19 Apr 2012 11:44:20 +0000 (UTC) Received: from localhost (g231173211.adsl.alicedsl.de [92.231.173.211]) by mx02.it-betrieb.de (mta-mx02) with ESMTP id 1C9CBAB203D; Thu, 19 Apr 2012 13:44:19 +0200 (CEST) Date: Thu, 19 Apr 2012 13:44:18 +0200 From: Lars Wilke To: Jack Vogel Message-ID: <20120419114418.GE15520@cklennard.localdomain> References: <8ol369-4nf.ln1@lwilke.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeout em driver 8.2-R X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 11:44:21 -0000 * Jack Vogel wrote: > ok then i guess i will upgrade to 8.3-R, is the driver there reasonably > new? > > Yes, that should be fine. Jack thanks, btw. i can quite reliably reproduce this issue. So if you or anybody else is interested in some data i might be able to get it. --lars From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 13:10:53 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 51A6F1065675; Thu, 19 Apr 2012 13:10:53 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id E785C8FC1F; Thu, 19 Apr 2012 13:10:52 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 14C837300A; Thu, 19 Apr 2012 15:30:18 +0200 (CEST) Date: Thu, 19 Apr 2012 15:30:18 +0200 From: Luigi Rizzo To: net@freebsd.org, current@freebsd.org Message-ID: <20120419133018.GA91364@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: Subject: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 13:10:53 -0000 I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various points of the path. Here are some results which I hope you find interesting. Test conditions: - intel i7-870 CPU running at 2.93 GHz + TurboBoost, all 4 cores enabled, no hyperthreading - FreeBSD HEAD as of 15 april 2012, no ipfw, no other pfilter clients, no ipv6 or ipsec. - userspace running 'netsend 10.0.0.2 5555 18 0 5' (output to a physical interface, udp port 5555, small frame, no rate limitations, 5sec experiments) - the 'ns' column reports the total time divided by the number of successful transmissions we report the min and max in 5 tests - 1 to 4 parallel tasks, variable packet sizes - there are variations in the numbers which become larger as we reach the bottom of the stack Caveats: - in the table below, clock and pktlen are constant. I am including the info here so it is easier to compare the results with future experiments - i have a small number of samples, so i am only reporting the min and the max in a handful of experiments. - i am only measuring average values over millions of cycles. I have no info on what is the variance between the various executions. - from what i have seen, numbers vary significantly on different systems, depending on memory speed, caches and other things. The big jumps are significant and present on all systems, but the small deltas (say < 5%) are not even statistically significant. - if someone is interested in replicating the experiments email me and i will post a link to a suitable picobsd image. - i have not yet instrumented the bottom layers (if_output and below). The results show a few interesting things: - the packet-sending application is reasonably fast and certainly not a bottleneck (over 100Mpps before calling the system call); - the system call is somewhat expensive, about 100ns. I am not sure where the time is spent (the amd64 code does a few push on the stack and then runs "syscall" (followed by a sysret). I am not sure how much room for improvement is there in this area. The relevant code is in lib/libc/i386/SYS.h and lib/libc/i386/sys/syscall.S (KERNCALL translates to "syscall" on amd64, and "int 0x80" on the i386) - the next expensive operation, consuming another 100ns, is the mbuf allocation in m_uiotombuf(). Nevertheless, the allocator seems to scale decently at least with 4 cores. The copyin() is relatively inexpensive (not reported in the data below, but disabling it saves only 15-20ns for a short packet). I have not followed the details, but the allocator calls the zone allocator and there is at least one critical_enter()/critical_exit() pair, and the highly modular architecture invokes long chains of indirect function calls both on allocation and release. It might make sense to keep a small pool of mbufs attached to the socket buffer instead of going to the zone allocator. Or defer the actual encapsulation to the (*so->so_proto->pr_usrreqs->pru_send)() which is called inline, anyways. - another big bottleneck is the route lookup in ip_output() (between entries 51 and 56). Not only it eats another 100ns+ on an empty routing table, but it also causes huge contentions when multiple cores are involved. There is other bad stuff occurring in if_output() and below (on this system it takes about 1300ns to send one packet even with one core, and ony 500-550 are consumed before the call to if_output()) but i don't have detailed information yet. POS CPU clock pktlen ns/pkt --- EXIT POINT ---- min max ----------------------------------------------------- U 1 2934 18 8 8 userspace, before the send() call [ syscall ] 20 1 2934 18 103 107 sys_sendto(): begin 20 4 2934 18 104 107 21 1 2934 18 110 113 sendit(): begin 21 4 2934 18 111 116 22 1 2934 18 110 114 sendit() after getsockaddr(&to, ...) 22 4 2934 18 111 124 23 1 2934 18 111 115 sendit() before kern_sendit 23 4 2934 18 112 120 24 1 2934 18 117 120 kern_sendit() after AUDIT_ARG_FD 24 4 2934 18 117 121 25 1 2934 18 134 140 kern_sendit() before sosend() 25 4 2934 18 134 146 40 1 2934 18 144 149 sosend_dgram(): start 40 4 2934 18 144 151 41 1 2934 18 157 166 sosend_dgram() before m_uiotombuf() 41 4 2934 18 157 168 [ mbuf allocation and copy. The copy is relatively cheap ] 42 1 2934 18 264 268 sosend_dgram() after m_uiotombuf() 42 4 2934 18 265 269 30 1 2934 18 273 276 udp_send() begin 30 4 2934 18 274 278 [ here we start seeing some contention with multiple threads ] 31 1 2934 18 323 324 udp_output() before ip_output() 31 4 2934 18 344 348 50 1 2934 18 326 331 ip_output() beginning 50 4 2934 18 356 367 51 1 2934 18 343 349 ip_output() before "if (opt) { ..." 51 4 2934 18 366 373 [ rtalloc() is sequential so multiple clients contend heavily ] 56 1 2934 18 470 480 ip_output() after rtalloc*() 56 4 2934 18 1310 1378 52 1 2934 18 472 488 ip_output() at sendit: 52 4 2934 18 1252 1286 53 1 2934 18 ip_output() before pfil_run_hooks() 53 4 2934 18 54 1 2934 18 476 477 ip_output() at passout: 54 4 2934 18 1249 1286 55 1 2934 18 509 526 ip_output() before if_output 55 4 2934 18 1268 1278 ---------------------------------------------------------------------- cheers luigi From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 13:58:16 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EF2711065686 for ; Thu, 19 Apr 2012 13:58:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C376F8FC17 for ; Thu, 19 Apr 2012 13:58:16 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 36623B958; Thu, 19 Apr 2012 09:58:16 -0400 (EDT) From: John Baldwin To: freebsd-net@freebsd.org Date: Thu, 19 Apr 2012 08:22:30 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> In-Reply-To: <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201204190822.31010.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 19 Apr 2012 09:58:16 -0400 (EDT) Cc: Luigi Rizzo , Sean Bruno , Jack Vogel Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 13:58:17 -0000 On Wednesday, April 18, 2012 7:40:17 pm Sean Bruno wrote: > > On Wed, 2012-04-18 at 09:49 -0700, Sean Bruno wrote: > > ok, good. that at least confirms that I correctly translated between > > the driver code and documented specification. > > > > I will try 8k as a test for now and see how that runs. > > > > sean > > For now, I've patched one front end server with: > /usr/src/sys/dev/e1000/if_igb.h:#define IGB_MAX_RXD 4096 * 4 > > And adjusted hw.igb.rxd: 8192 > > So far so good, been running in production for a couple of hours so the > "smoke test" for this setting seems to be happy. > > We'll continue to adjust and test tomorrow during higher load > conditions. FWIW, at my current employer we run with both rxd and txd cranked up to 32k (we had to patch the driver as you suggested) and have not had any problems doing that for a couple of years now. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 14:09:56 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 62818106566B; Thu, 19 Apr 2012 14:09:56 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id BE3288FC1D; Thu, 19 Apr 2012 14:09:55 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so8364240wgb.31 for ; Thu, 19 Apr 2012 07:09:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=XDGwb28BWl9tgjzZFKikmh8ASwBytf1lFhIcqHMF9iE=; b=xlBUjkGTlpPh4sQh2LJzBURQqu/Y+3JZt7uHdZTTixkEsSudqnKx9jGjKKtznVR+av uHmIwxhnyqLxtV5wf2tlGXhSSCjSx/ugdERHY6XVHdvCdk4c1ATSR2nkSRZ0NYG3uZZ6 7DGN8lh7QY0vebaTboP5oIDTswUJdeZdBdLVERs50tRmIznEv72DGAKtnBlD8/Z8+b+w WCqPVOYmWOJvkRamlEUR2rLRs9HjL0xEb9cDCqaZl9LXRr1L3/ffmGbUqJ3LDTOilesK X4O1M1oQhPT6zTotELpBoLFDAF0FYCDFxSa+rfaowrFkBuF4a44Rg3x/ArCXnkAuuSh8 tgHg== MIME-Version: 1.0 Received: by 10.180.107.104 with SMTP id hb8mr5694440wib.8.1334844589152; Thu, 19 Apr 2012 07:09:49 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Thu, 19 Apr 2012 07:09:49 -0700 (PDT) In-Reply-To: <201204190822.31010.jhb@freebsd.org> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> <201204190822.31010.jhb@freebsd.org> Date: Thu, 19 Apr 2012 07:09:49 -0700 Message-ID: From: Jack Vogel To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, Luigi Rizzo , Sean Bruno Subject: Re: igb(4) Raising IGB_MAX_TXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 14:09:56 -0000 OH, well that's interesting to know, thanks John. Jack On Thu, Apr 19, 2012 at 5:22 AM, John Baldwin wrote: > On Wednesday, April 18, 2012 7:40:17 pm Sean Bruno wrote: > > > > On Wed, 2012-04-18 at 09:49 -0700, Sean Bruno wrote: > > > ok, good. that at least confirms that I correctly translated between > > > the driver code and documented specification. > > > > > > I will try 8k as a test for now and see how that runs. > > > > > > sean > > > > For now, I've patched one front end server with: > > /usr/src/sys/dev/e1000/if_igb.h:#define IGB_MAX_RXD 4096 * 4 > > > > And adjusted hw.igb.rxd: 8192 > > > > So far so good, been running in production for a couple of hours so the > > "smoke test" for this setting seems to be happy. > > > > We'll continue to adjust and test tomorrow during higher load > > conditions. > > FWIW, at my current employer we run with both rxd and txd cranked up to 32k > (we had to patch the driver as you suggested) and have not had any problems > doing that for a couple of years now. > > -- > John Baldwin > From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 18:53:32 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2E166106564A; Thu, 19 Apr 2012 18:53:32 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id D74F58FC0A; Thu, 19 Apr 2012 18:53:31 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1SKwUI-0007eY-C7; Thu, 19 Apr 2012 22:53:50 +0400 Date: Thu, 19 Apr 2012 22:53:50 +0400 From: Slawa Olhovchenkov To: Luigi Rizzo Message-ID: <20120419185350.GC76983@zxy.spb.ru> References: <20120419133018.GA91364@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120419133018.GA91364@onelab2.iet.unipi.it> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 18:53:32 -0000 On Thu, Apr 19, 2012 at 03:30:18PM +0200, Luigi Rizzo wrote: > I have been running some performance tests on UDP sockets, > using the netsend program in tools/tools/netrate/netsend > and instrumenting the source code and the kernel do return in > various points of the path. Here are some results which > I hope you find interesting. I do some test in 2011. May be this test is not actual now. May be actual. Initial message http://lists.freebsd.org/pipermail/freebsd-performance/2011-January/004156.html UDP socket in FreeBSD http://lists.freebsd.org/pipermail/freebsd-performance/2011-February/004176.html About 4BSD/ULE http://lists.freebsd.org/pipermail/freebsd-performance/2011-February/004181.html From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 19:27:05 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1AF4F106566B; Thu, 19 Apr 2012 19:27:05 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1.freebsd.org (Postfix) with ESMTP id AFA2B8FC12; Thu, 19 Apr 2012 19:27:04 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3JJQgk1090349; Thu, 19 Apr 2012 12:26:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334863604; bh=CgezNIxFrgin2tMRRpsfPTaW6RUEp393NHbdHPoSqQE=; h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date: Message-ID:Mime-Version:Content-Transfer-Encoding; b=Phi0UC6vDlgaAu9pnng8tbRZAbYMatcLf6MZXioRzpVsICgEPV8H4Q+0V9i4IYPV/ fDJBhz3ovoE6pCzVvT9zB6zJkY9rGrVSC6BhjgSwgqpeSr6kuW6ueAaNYcY50zJnGO nLIUlXLgc8WuU1RMPvKMGsKnkEvomQ4hL0YELl9I= From: Sean Bruno To: Jack Vogel In-Reply-To: References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> <201204190822.31010.jhb@freebsd.org> Content-Type: text/plain; charset="UTF-8" Date: Thu, 19 Apr 2012 12:26:42 -0700 Message-ID: <1334863602.4126.9.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 863603002 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo , John Baldwin Subject: Re: igb(4) Raising IGB_MAX_RXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 19:27:05 -0000 On Thu, 2012-04-19 at 07:09 -0700, Jack Vogel wrote: > OH, well that's interesting to know, thanks John. > > Jack > Front end box looks pretty happy today at 8k descriptors. http://people.freebsd.org/~sbruno/igb_8k_stats.txt Under peak, we're approaching 20MBytes/sec in and out of the interface. :-) Nifty. -bash-4.2$ netstat 1 input (Total) output packets errs idrops bytes packets errs bytes colls 59542 0 0 18189602 59131 0 19884085 0 58941 0 0 18036651 58673 0 19702671 0 58790 0 0 18069235 58422 0 19897858 0 58226 0 0 17948175 57969 0 19648810 0 58689 0 0 18167855 58479 0 19909843 0 58633 0 0 17952951 58437 0 19760197 0 61019 0 0 18779030 60592 0 20394481 0 56696 0 0 17647407 56552 0 19261155 0 58853 0 0 18186019 58530 0 19886197 0 58739 0 0 18314790 58768 0 20165654 0 58748 0 0 18267243 58539 0 20016668 0 58672 0 0 17914657 58378 0 19558833 0 59885 0 0 18332641 59780 0 20239241 0 We're going to crank one server up to 8 igb queues and hw.igb.max_rxd/txd to 32k and see what blows up. Sean From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 19:59:40 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D30B106564A; Thu, 19 Apr 2012 19:59:40 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id C5CE88FC16; Thu, 19 Apr 2012 19:59:39 +0000 (UTC) Received: by wibhj6 with SMTP id hj6so1714251wib.13 for ; Thu, 19 Apr 2012 12:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=AVPhqRJhEhp4bzxjpG7YIiUPupeP50EjvazklSe6GDc=; b=TMoMgyFjUEOP4HC9noQyPcSJJYZfmTiJwQn9OdTXB4HDHTls6dh2K55eSKXh1NH8Ig Wk+TCJk2psggBOb78QHoIyBEEvwl2p6T01/PynsmDiZvH1OUUWxNyGh5gcUITpCVovCg vCpvnYousTeRw5NM/QdJIJZdwB+B2L0VHv8hFLZRCX4Z1jRIcqZeA0uz4WdkkzjttG19 6cBatmXsUXZSvoGQ9m7TiZThFwGjicxw5CHOzIpaRLb+rQqLR2z0vphY+O4eYIXi8QXz 7b46ecxK7YQAjXN0fWLPVZcuVt4J4gVXWj7pkGbaHjyILV6gSHmETCRHCH7cDQyogGgE 0+/g== MIME-Version: 1.0 Received: by 10.180.89.9 with SMTP id bk9mr8334810wib.11.1334865578693; Thu, 19 Apr 2012 12:59:38 -0700 (PDT) Received: by 10.180.3.170 with HTTP; Thu, 19 Apr 2012 12:59:38 -0700 (PDT) In-Reply-To: <1334863602.4126.9.camel@powernoodle-l7.corp.yahoo.com> References: <1334705064.4486.23.camel@powernoodle-l7.corp.yahoo.com> <1334767746.3466.6.camel@powernoodle-l7.corp.yahoo.com> <1334792417.19343.11.camel@powernoodle-l7.corp.yahoo.com> <201204190822.31010.jhb@freebsd.org> <1334863602.4126.9.camel@powernoodle-l7.corp.yahoo.com> Date: Thu, 19 Apr 2012 12:59:38 -0700 Message-ID: From: Jack Vogel To: Sean Bruno Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-net@freebsd.org" , Luigi Rizzo , John Baldwin Subject: Re: igb(4) Raising IGB_MAX_RXD ?? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 19:59:40 -0000 On Thu, Apr 19, 2012 at 12:26 PM, Sean Bruno wrote: > On Thu, 2012-04-19 at 07:09 -0700, Jack Vogel wrote: > > OH, well that's interesting to know, thanks John. > > > > Jack > > > > > Front end box looks pretty happy today at 8k descriptors. > > http://people.freebsd.org/~sbruno/igb_8k_stats.txt > > Under peak, we're approaching 20MBytes/sec in and out of the > interface. :-) Nifty. > > -bash-4.2$ netstat 1 > input (Total) output > packets errs idrops bytes packets errs bytes colls > 59542 0 0 18189602 59131 0 19884085 0 > 58941 0 0 18036651 58673 0 19702671 0 > 58790 0 0 18069235 58422 0 19897858 0 > 58226 0 0 17948175 57969 0 19648810 0 > 58689 0 0 18167855 58479 0 19909843 0 > 58633 0 0 17952951 58437 0 19760197 0 > 61019 0 0 18779030 60592 0 20394481 0 > 56696 0 0 17647407 56552 0 19261155 0 > 58853 0 0 18186019 58530 0 19886197 0 > 58739 0 0 18314790 58768 0 20165654 0 > 58748 0 0 18267243 58539 0 20016668 0 > 58672 0 0 17914657 58378 0 19558833 0 > 59885 0 0 18332641 59780 0 20239241 0 > > > We're going to crank one server up to 8 igb queues and > hw.igb.max_rxd/txd to 32k and see what blows up. > > Sean > > > Great, look forward to the results. Thanks Sean. Jack From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 20:11:30 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 025881065672 for ; Thu, 19 Apr 2012 20:11:30 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 659818FC0A for ; Thu, 19 Apr 2012 20:11:29 +0000 (UTC) Received: (qmail 15206 invoked from network); 19 Apr 2012 20:00:01 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 19 Apr 2012 20:00:01 -0000 Message-ID: <4F907011.9080602@freebsd.org> Date: Thu, 19 Apr 2012 22:05:37 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <20120419133018.GA91364@onelab2.iet.unipi.it> In-Reply-To: <20120419133018.GA91364@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 20:11:30 -0000 On 19.04.2012 15:30, Luigi Rizzo wrote: > I have been running some performance tests on UDP sockets, > using the netsend program in tools/tools/netrate/netsend > and instrumenting the source code and the kernel do return in > various points of the path. Here are some results which > I hope you find interesting. Jumping over very interesting analysis... > - the next expensive operation, consuming another 100ns, > is the mbuf allocation in m_uiotombuf(). Nevertheless, the allocator > seems to scale decently at least with 4 cores. The copyin() is > relatively inexpensive (not reported in the data below, but > disabling it saves only 15-20ns for a short packet). > > I have not followed the details, but the allocator calls the zone > allocator and there is at least one critical_enter()/critical_exit() > pair, and the highly modular architecture invokes long chains of > indirect function calls both on allocation and release. > > It might make sense to keep a small pool of mbufs attached to the > socket buffer instead of going to the zone allocator. > Or defer the actual encapsulation to the > (*so->so_proto->pr_usrreqs->pru_send)() which is called inline, anyways. The UMA mbuf allocator is certainly not perfect but rather good. It has a per-CPU cache of mbuf's that are very fast to allocate from. Once it has used them it needs to refill from the global pool which may happen from time to time and show up in the averages. > - another big bottleneck is the route lookup in ip_output() > (between entries 51 and 56). Not only it eats another > 100ns+ on an empty routing table, but it also > causes huge contentions when multiple cores > are involved. This is indeed a big problem. I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which doesn't produce any lock contention or cache pollution. Also skipping the per-route lock while the table read-lock is held should help some more. All in all this should give a massive gain in high pps situations at the expense of costlier routing table changes. However changes are seldom to essentially never with a single default route. After that the ARP table will gets same treatment and the low stack lock contention points should be gone for good. -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 20:26:52 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D13F106566C; Thu, 19 Apr 2012 20:26:52 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 47CF28FC14; Thu, 19 Apr 2012 20:26:52 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 4F4C273029; Thu, 19 Apr 2012 22:46:22 +0200 (CEST) Date: Thu, 19 Apr 2012 22:46:22 +0200 From: Luigi Rizzo To: Andre Oppermann Message-ID: <20120419204622.GA94904@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F907011.9080602@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 20:26:52 -0000 On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: > On 19.04.2012 15:30, Luigi Rizzo wrote: > >I have been running some performance tests on UDP sockets, > >using the netsend program in tools/tools/netrate/netsend > >and instrumenting the source code and the kernel do return in > >various points of the path. Here are some results which > >I hope you find interesting. > > Jumping over very interesting analysis... > > >- the next expensive operation, consuming another 100ns, > > is the mbuf allocation in m_uiotombuf(). Nevertheless, the allocator > > seems to scale decently at least with 4 cores. The copyin() is > > relatively inexpensive (not reported in the data below, but > > disabling it saves only 15-20ns for a short packet). > > > > I have not followed the details, but the allocator calls the zone > > allocator and there is at least one critical_enter()/critical_exit() > > pair, and the highly modular architecture invokes long chains of > > indirect function calls both on allocation and release. > > > > It might make sense to keep a small pool of mbufs attached to the > > socket buffer instead of going to the zone allocator. > > Or defer the actual encapsulation to the > > (*so->so_proto->pr_usrreqs->pru_send)() which is called inline, anyways. > > The UMA mbuf allocator is certainly not perfect but rather good. > It has a per-CPU cache of mbuf's that are very fast to allocate > from. Once it has used them it needs to refill from the global > pool which may happen from time to time and show up in the averages. indeed i was pleased to see no difference between 1 and 4 threads. This also suggests that the global pool is accessed very seldom, and for short times, otherwise you'd see the effect with 4 threads. What might be moderately expensive are the critical_enter()/critical_exit() calls around individual allocations. The allocation happens while the code has already an exclusive lock on so->snd_buf so a pool of fresh buffers could be attached there. But the other consideration is that one could defer the mbuf allocation to a later time when the packet is actually built (or anyways right before the thread returns). What i envision (and this would fit nicely with netmap) is the following: - have a (possibly readonly) template for the headers (MAC+IP+UDP) attached to the socket, built on demand, and cached and managed with similar invalidation rules as used by fastforward; - possibly extend the pru_send interface so one can pass down the uio instead of the mbuf; - make an opportunistic buffer allocation in some place downstream, where the code already has an x-lock on some resource (could be the snd_buf, the interface, ...) so the allocation comes for free. > >- another big bottleneck is the route lookup in ip_output() > > (between entries 51 and 56). Not only it eats another > > 100ns+ on an empty routing table, but it also > > causes huge contentions when multiple cores > > are involved. > > This is indeed a big problem. I'm working (rough edges remain) on > changing the routing table locking to an rmlock (read-mostly) which i was wondering, is there a way (and/or any advantage) to use the fastforward code to look up the route for locally sourced packets ? cheers luigi From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 20:34:47 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46B9F106564A; Thu, 19 Apr 2012 20:34:47 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id E4EAE8FC12; Thu, 19 Apr 2012 20:34:46 +0000 (UTC) Received: by iahk25 with SMTP id k25so16162210iah.13 for ; Thu, 19 Apr 2012 13:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=buv4RqLaamZ9gTkWjFB/bxxdamfK4foqvoXtm3kgcJU=; b=JUJQSYXJ14ujWukrqAypqDfUyHL1hTA3z9ggWHHoyAruNH1TE6mq9TFr0D0iLOxPzX HP02eO/dyzR4QKc5Wvdd+bpWNDFkg1cFZYo1gxwijITl22i7PGEiyWYnIuvB07py6hyj OPJdqjzgoPJ7we+iVixv1skqj8DgMVmPm9jaa5jFJOIM5V+iNua1jx2yu5oybGlnaAxO GswWa0P2IWvvt8RlcdaWgz6Bh9rDXOIe5twrpqQ1XRczyS954jDeR7MQP58B13kEbtz4 fawQDxs1cqnUMi/iO1d0a3EwOdDkL9AuPpzcMQeScvvrW4yPTMVIrYZZUAALB1rfTNtn CyOQ== MIME-Version: 1.0 Received: by 10.42.215.68 with SMTP id hd4mr3039567icb.30.1334867685664; Thu, 19 Apr 2012 13:34:45 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Thu, 19 Apr 2012 13:34:45 -0700 (PDT) In-Reply-To: <20120419204622.GA94904@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> Date: Thu, 19 Apr 2012 22:34:45 +0200 X-Google-Sender-Auth: bUKbyZeO5HKtET2VBglA8ZX2Ja4 Message-ID: From: "K. Macy" To: Luigi Rizzo Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Andre Oppermann , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 20:34:47 -0000 >> This is indeed a big problem. =A0I'm working (rough edges remain) on >> changing the routing table locking to an rmlock (read-mostly) which > This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry itself to increment and decrement the rtentry's reference count. > i was wondering, is there a way (and/or any advantage) to use the > fastforward code to look up the route for locally sourced packets ? > If the number of peers is bounded then you can use the flowtable. Max PPS is much higher bypassing routing lookup. However, it doesn't scale to arbitrary flow numbers. -Kip From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 20:48:37 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A4424106564A for ; Thu, 19 Apr 2012 20:48:37 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.bf1.yahoo.com (mrout1-b.corp.bf1.yahoo.com [98.139.253.104]) by mx1.freebsd.org (Postfix) with ESMTP id 695918FC08 for ; Thu, 19 Apr 2012 20:48:37 +0000 (UTC) Received: from [IPv6:::1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3JKmOrj021650; Thu, 19 Apr 2012 13:48:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1334868505; bh=jAszQqt9E9jSla2Tyulf3QNdxhDiswnlszOVxd6bvNA=; h=Subject:From:To:Cc:Content-Type:Date:Message-ID:Mime-Version: Content-Transfer-Encoding; b=Qg8QuXPAeNu62MVOMKqhDG0kRQ6/jfidzHghqyAxHHkp/Z2fYLsVbEo5BBhf9Ez6t zqXwjWxyMwoUqexCkb5Bwkzrp0YpAD7F90mtWlJDmuCVRMCowIYLh6/wNnjkmkw7cW 5jx2tR6p0+bsCXyu+1Tf4ugS4WredFRyTvqT0GZk= From: Sean Bruno To: "freebsd-net@freebsd.org" Content-Type: text/plain; charset="UTF-8" Date: Thu, 19 Apr 2012 13:48:24 -0700 Message-ID: <1334868504.4126.10.camel@powernoodle-l7.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 868505000 Cc: Jack Vogel Subject: Comment nit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 20:48:37 -0000 I noted a small nit in the comments of sys/dev/e1000/if_igb.h Index: if_igb.h =================================================================== --- if_igb.h (revision 234466) +++ if_igb.h (working copy) @@ -52,7 +52,7 @@ #define IGB_MAX_TXD 4096 /* - * IGB_RXD: Maximum number of Transmit Descriptors + * IGB_RXD: Maximum number of Receive Descriptors * * This value is the number of receive descriptors allocated by the driver. * Increasing this value allows the driver to buffer more incoming packets. From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:02:53 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7177A106566B; Thu, 19 Apr 2012 21:02:53 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 298998FC15; Thu, 19 Apr 2012 21:02:52 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 2D8A473029; Thu, 19 Apr 2012 23:22:24 +0200 (CEST) Date: Thu, 19 Apr 2012 23:22:24 +0200 From: Luigi Rizzo To: "K. Macy" Message-ID: <20120419212224.GA95459@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Andre Oppermann , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:02:53 -0000 On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: > >> This is indeed a big problem. ?I'm working (rough edges remain) on > >> changing the routing table locking to an rmlock (read-mostly) which > > > > This only helps if your flows aren't hitting the same rtentry. > Otherwise you still convoy on the lock for the rtentry itself to > increment and decrement the rtentry's reference count. > > > i was wondering, is there a way (and/or any advantage) to use the > > fastforward code to look up the route for locally sourced packets ? actually, now that i look at the code, both ip_output() and the ip_fastforward code use the same in_rtalloc_ign(...) > > > > If the number of peers is bounded then you can use the flowtable. Max > PPS is much higher bypassing routing lookup. However, it doesn't scale > to arbitrary flow numbers. re. flowtable, could you point me to what i should do instead of calling in_rtalloc_ign() ? cheers luigi From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:06:39 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 467BC106564A; Thu, 19 Apr 2012 21:06:39 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id E14CB8FC16; Thu, 19 Apr 2012 21:06:38 +0000 (UTC) Received: by iahk25 with SMTP id k25so16204273iah.13 for ; Thu, 19 Apr 2012 14:06:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=iN23oN6WTe8edGdt9aVL7d9+1s9ABXoM5R6xLO+wTAk=; b=UmPaIUK1tE4wJt060BmFvoAdkcW79HQ5lGKUh1dd+uRCgxeDPa3S+8q7vUHLlbnHzd Nsjtk7Zd5YY5pH7AN/Dv1XpSITr2yn+Pr8b2rFQTYa/wSDEK0awkK+zQ1eXuCxfK81AU qU0c+Dyjxouc2afFcsQo6ZtmrNe/C8i56F2sy4fBXdM7IJdrS5q023OOBa2toNb5zHfN eEHUPJ8zcPR/yz+1U53UsvKohwmEm99YEECFO0+F1K4BjwjFMbFtfQm/xIvch2jmOsdV V0O86c9UL2vODXJGwpN7jZui9bOgTaULAngtIXU9p9eCeUaN5aqIHXLl5s8UKXT6ud5X w5zg== MIME-Version: 1.0 Received: by 10.50.194.232 with SMTP id hz8mr3912197igc.38.1334869598605; Thu, 19 Apr 2012 14:06:38 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Thu, 19 Apr 2012 14:06:38 -0700 (PDT) In-Reply-To: <20120419212224.GA95459@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <20120419212224.GA95459@onelab2.iet.unipi.it> Date: Thu, 19 Apr 2012 23:06:38 +0200 X-Google-Sender-Auth: FTB_DkrCoCzDUCOEYnPzbMJVtwE Message-ID: From: "K. Macy" To: Luigi Rizzo Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Andre Oppermann , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:06:39 -0000 On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo wrote: > On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: >> >> This is indeed a big problem. ?I'm working (rough edges remain) on >> >> changing the routing table locking to an rmlock (read-mostly) which >> > >> >> This only helps if your flows aren't hitting the same rtentry. >> Otherwise you still convoy on the lock for the rtentry itself to >> increment and decrement the rtentry's reference count. >> >> > i was wondering, is there a way (and/or any advantage) to use the >> > fastforward code to look up the route for locally sourced packets ? > > actually, now that i look at the code, both ip_output() and > the ip_fastforward code use the same in_rtalloc_ign(...) > >> > >> >> If the number of peers is bounded then you can use the flowtable. Max >> PPS is much higher bypassing routing lookup. However, it doesn't scale >> to arbitrary flow numbers. > > re. flowtable, could you point me to what i should do instead of > calling in_rtalloc_ign() ? If you build with it in your kernel config and enable the sysctl ip_output will automatically use it for TCP and UDP connections. If you're doing forwarding you'll need to patch the forwarding path. Fabien Thomas has a patch for that that I just fixed/identified a bug in for him. -Kip --=20 =A0 =A0=93The real damage is done by those millions who want to 'get by.' The ordinary men who just want to be left in peace. Those who don=92t want their little lives disturbed by anything bigger than themselves. Those with no sides and no causes. Those who won=92t take measure of their own strength, for fear of antagonizing their own weakness. Those who don=92t like to make waves=97or enemies. =A0 =A0Those for whom freedom, honour, truth, and principles are only literature. Those who live small, love small, die small. It=92s the reductionist approach to life: if you keep it small, you=92ll keep it under control. If you don=92t make any noise, the bogeyman won=92t find you. =A0 =A0But it=92s all an illusion, because they die too, those people who roll up their spirits into tiny little balls so as to be safe. Safe?! >From what? Life is always on the edge of death; narrow streets lead to the same place as wide avenues, and a little candle burns itself out just like a flaming torch does. =A0 =A0I choose my own way to burn.=94 =A0 =A0Sophie Scholl From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:11:32 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5822A1065673 for ; Thu, 19 Apr 2012 21:11:32 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 71EC78FC1A for ; Thu, 19 Apr 2012 21:11:31 +0000 (UTC) Received: (qmail 15491 invoked from network); 19 Apr 2012 21:06:44 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 19 Apr 2012 21:06:44 -0000 Message-ID: <4F907FB4.3080400@freebsd.org> Date: Thu, 19 Apr 2012 23:12:20 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: "K. Macy" References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:11:32 -0000 On 19.04.2012 22:34, K. Macy wrote: >>> This is indeed a big problem. I'm working (rough edges remain) on >>> changing the routing table locking to an rmlock (read-mostly) which >> > > This only helps if your flows aren't hitting the same rtentry. > Otherwise you still convoy on the lock for the rtentry itself to > increment and decrement the rtentry's reference count. The rtentry lock isn't obtained anymore. While the rmlock read lock is held on the rtable the relevant information like ifp and such is copied out. No later referencing possible. In the end any referencing of an rtentry would be forbidden and the rtentry lock can be removed. The second step can be optional though. >> i was wondering, is there a way (and/or any advantage) to use the >> fastforward code to look up the route for locally sourced packets ? >> > > If the number of peers is bounded then you can use the flowtable. Max > PPS is much higher bypassing routing lookup. However, it doesn't scale > to arbitrary flow numbers. In theory a rmlock-only lookup into a default-route only routing table would be faster than creating a flow table entry for every destination. It a matter of churn though. The flowtable isn't lockless in itself, is it? -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:17:42 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 761A0106567A; Thu, 19 Apr 2012 21:17:42 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 042098FC0A; Thu, 19 Apr 2012 21:17:41 +0000 (UTC) Received: by ghrr20 with SMTP id r20so5827735ghr.13 for ; Thu, 19 Apr 2012 14:17:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=WDjnKcnbDW84Qn4y0ukwCVgwaiqViy2MULkLUVZKz20=; b=oTQRdNkxDlPY+/1+ronDP5n1m7rYDR8Ya75r2zRPsOx5PpmyePMtJVsbToGPARET/o jQQhdecLSXZIkGWYXO1dbotZ/MZUobnKnhR1Li+uQkaIeJ1YDmzKdPRpFlGdU4LEdber eMYSgs378lrSKBra0hHP/KJIKw3ER2HueceiCSX2BkTToZBTddSoZsz0UbO1rfricB9d w+JkEhM5YIVBAtnkSVW+bi3V934WMgsI/OMh+WZ+O/pLZG0zkeUWwHhrmCAkyyfuv1+m EtCP04aTxOr3RuarjdXEBPf/J4JEVo2f6PcM92lDHN4BsFYE+siBAEvjr08XObKenm1R XJKA== MIME-Version: 1.0 Received: by 10.50.194.232 with SMTP id hz8mr3932371igc.38.1334870261345; Thu, 19 Apr 2012 14:17:41 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Thu, 19 Apr 2012 14:17:41 -0700 (PDT) In-Reply-To: <4F907FB4.3080400@freebsd.org> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> Date: Thu, 19 Apr 2012 23:17:41 +0200 X-Google-Sender-Auth: q7xbaDuXNxDADDM4ozkcC7eXqE8 Message-ID: From: "K. Macy" To: Andre Oppermann Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:17:42 -0000 >> This only helps if your flows aren't hitting the same rtentry. >> Otherwise you still convoy on the lock for the rtentry itself to >> increment and decrement the rtentry's reference count. > > > The rtentry lock isn't obtained anymore. =A0While the rmlock read > lock is held on the rtable the relevant information like ifp and > such is copied out. =A0No later referencing possible. =A0In the end > any referencing of an rtentry would be forbidden and the rtentry > lock can be removed. =A0The second step can be optional though. Can you point me to a tree where you've made these changes? >>> i was wondering, is there a way (and/or any advantage) to use the >>> fastforward code to look up the route for locally sourced packets ? >>> >> >> If the number of peers is bounded then you can use the flowtable. Max >> PPS is much higher bypassing routing lookup. However, it doesn't scale >> to arbitrary flow numbers. > > > In theory a rmlock-only lookup into a default-route only routing > table would be faster than creating a flow table entry for every > destination. =A0It a matter of churn though. =A0The flowtable isn't > lockless in itself, is it? It is. In a steady state where the working set of peers fits in the table it should be just a simple hash of the ip and then a lookup. -Kip --=20 =A0 =A0=93The real damage is done by those millions who want to 'get by.' The ordinary men who just want to be left in peace. Those who don=92t want their little lives disturbed by anything bigger than themselves. Those with no sides and no causes. Those who won=92t take measure of their own strength, for fear of antagonizing their own weakness. Those who don=92t like to make waves=97or enemies. =A0 =A0Those for whom freedom, honour, truth, and principles are only literature. Those who live small, love small, die small. It=92s the reductionist approach to life: if you keep it small, you=92ll keep it under control. If you don=92t make any noise, the bogeyman won=92t find you. =A0 =A0But it=92s all an illusion, because they die too, those people who roll up their spirits into tiny little balls so as to be safe. Safe?! >From what? Life is always on the edge of death; narrow streets lead to the same place as wide avenues, and a little candle burns itself out just like a flaming torch does. =A0 =A0I choose my own way to burn.=94 =A0 =A0Sophie Scholl From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:19:11 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 555471065672 for ; Thu, 19 Apr 2012 21:19:11 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id A50028FC1C for ; Thu, 19 Apr 2012 21:19:10 +0000 (UTC) Received: (qmail 15558 invoked from network); 19 Apr 2012 21:14:23 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 19 Apr 2012 21:14:23 -0000 Message-ID: <4F908180.6010408@freebsd.org> Date: Thu, 19 Apr 2012 23:20:00 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> In-Reply-To: <20120419204622.GA94904@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:19:11 -0000 On 19.04.2012 22:46, Luigi Rizzo wrote: > On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: >> On 19.04.2012 15:30, Luigi Rizzo wrote: >>> I have been running some performance tests on UDP sockets, >>> using the netsend program in tools/tools/netrate/netsend >>> and instrumenting the source code and the kernel do return in >>> various points of the path. Here are some results which >>> I hope you find interesting. >> >> Jumping over very interesting analysis... >> >>> - the next expensive operation, consuming another 100ns, >>> is the mbuf allocation in m_uiotombuf(). Nevertheless, the allocator >>> seems to scale decently at least with 4 cores. The copyin() is >>> relatively inexpensive (not reported in the data below, but >>> disabling it saves only 15-20ns for a short packet). >>> >>> I have not followed the details, but the allocator calls the zone >>> allocator and there is at least one critical_enter()/critical_exit() >>> pair, and the highly modular architecture invokes long chains of >>> indirect function calls both on allocation and release. >>> >>> It might make sense to keep a small pool of mbufs attached to the >>> socket buffer instead of going to the zone allocator. >>> Or defer the actual encapsulation to the >>> (*so->so_proto->pr_usrreqs->pru_send)() which is called inline, anyways. >> >> The UMA mbuf allocator is certainly not perfect but rather good. >> It has a per-CPU cache of mbuf's that are very fast to allocate >> from. Once it has used them it needs to refill from the global >> pool which may happen from time to time and show up in the averages. > > indeed i was pleased to see no difference between 1 and 4 threads. > This also suggests that the global pool is accessed very seldom, > and for short times, otherwise you'd see the effect with 4 threads. Robert did the per-CPU mbuf allocator pools a few years ago. Excellent engineering. > What might be moderately expensive are the critical_enter()/critical_exit() > calls around individual allocations. Can't get away from those as a thread must not migrate away when manipulating the per-CPU mbuf pool. > The allocation happens while the code has already an exclusive > lock on so->snd_buf so a pool of fresh buffers could be attached > there. Ah, there it is not necessary to hold the snd_buf lock while doing the allocate+copyin. With soreceive_stream() (which is experimental not enabled by default) I did just that for the receive path. It's quite a significant gain there. IMHO better resolve the locking order than to juggle yet another mbuf sink. > But the other consideration is that one could defer the mbuf allocation > to a later time when the packet is actually built (or anyways > right before the thread returns). > What i envision (and this would fit nicely with netmap) is the following: > - have a (possibly readonly) template for the headers (MAC+IP+UDP) > attached to the socket, built on demand, and cached and managed > with similar invalidation rules as used by fastforward; That would require to cross-pointer the rtentry and whatnot again. We want to get away from that to untangle the (locking) mess that eventually results from it. > - possibly extend the pru_send interface so one can pass down the uio > instead of the mbuf; > - make an opportunistic buffer allocation in some place downstream, > where the code already has an x-lock on some resource (could be > the snd_buf, the interface, ...) so the allocation comes for free. ETOOCOMPLEXOVERTIME. >>> - another big bottleneck is the route lookup in ip_output() >>> (between entries 51 and 56). Not only it eats another >>> 100ns+ on an empty routing table, but it also >>> causes huge contentions when multiple cores >>> are involved. >> >> This is indeed a big problem. I'm working (rough edges remain) on >> changing the routing table locking to an rmlock (read-mostly) which > > i was wondering, is there a way (and/or any advantage) to use the > fastforward code to look up the route for locally sourced packets ? No. The main advantage/difference of fastforward is the short code path and processing to completion. -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:26:14 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E7EF1065744 for ; Thu, 19 Apr 2012 21:26:14 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id A1CA88FC15 for ; Thu, 19 Apr 2012 21:26:13 +0000 (UTC) Received: (qmail 15631 invoked from network); 19 Apr 2012 21:21:26 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 19 Apr 2012 21:21:26 -0000 Message-ID: <4F908327.5070905@freebsd.org> Date: Thu, 19 Apr 2012 23:27:03 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: "K. Macy" References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:26:14 -0000 On 19.04.2012 23:17, K. Macy wrote: >>> This only helps if your flows aren't hitting the same rtentry. >>> Otherwise you still convoy on the lock for the rtentry itself to >>> increment and decrement the rtentry's reference count. >> >> >> The rtentry lock isn't obtained anymore. While the rmlock read >> lock is held on the rtable the relevant information like ifp and >> such is copied out. No later referencing possible. In the end >> any referencing of an rtentry would be forbidden and the rtentry >> lock can be removed. The second step can be optional though. > > Can you point me to a tree where you've made these changes? It's not in a public tree. I just did a 'svn up' and the recent pf and rtsocket changes created some conflicts. Have to solve them before posting. Timeframe (early) next week. >>>> i was wondering, is there a way (and/or any advantage) to use the >>>> fastforward code to look up the route for locally sourced packets ? >>>> >>> >>> If the number of peers is bounded then you can use the flowtable. Max >>> PPS is much higher bypassing routing lookup. However, it doesn't scale >>> to arbitrary flow numbers. >> >> >> In theory a rmlock-only lookup into a default-route only routing >> table would be faster than creating a flow table entry for every >> destination. It a matter of churn though. The flowtable isn't >> lockless in itself, is it? > > It is. In a steady state where the working set of peers fits in the > table it should be just a simple hash of the ip and then a lookup. Yes, but the lookup requires a lock? Or is every entry replicated to every CPU? So a number of concurrent CPU's sending to the same UDP destination would content on that lock? -- Andre From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:35:39 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 611861065670; Thu, 19 Apr 2012 21:35:39 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 090468FC0C; Thu, 19 Apr 2012 21:35:38 +0000 (UTC) Received: by iahk25 with SMTP id k25so16242486iah.13 for ; Thu, 19 Apr 2012 14:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Asi4YOMxu43RtDGRFcdZPT1OGPXziO4+WjVuKC/gOS4=; b=VUp9UtqKmuhuXwDrp2GGnMBE6XyM7ascY57VExwP5aZskYAAhAA+4HgA+3tN4LeRsd vI9W1KTFAeLK6v7zu3c3+0a559p4VyxXf/1bIyMAYgOG57BbpeNOfTYsyoW10DvCXGl/ xttI2hSemQuHalecjvM7GZHcRKRu3Qgy92pLVfRXQ+9q60fcuZ9u+5RRn3mxe6kjKTne IcMqAC2Naux3FTQQyPJPZY1DpEcr19fbm4Czp8uNjg9wlOgUHRlsIgTwKc5EhBRY4Toc EyfF8S0TW54xO6JoUArE2DQV/8OkwbmbXXroOXlCIzmEknam7HV0WxWvainnHl6f+9rx ojSg== MIME-Version: 1.0 Received: by 10.42.215.68 with SMTP id hd4mr3133446icb.30.1334871337762; Thu, 19 Apr 2012 14:35:37 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Thu, 19 Apr 2012 14:35:37 -0700 (PDT) In-Reply-To: <4F908327.5070905@freebsd.org> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> <4F908327.5070905@freebsd.org> Date: Thu, 19 Apr 2012 23:35:37 +0200 X-Google-Sender-Auth: 1nWblGoeSwC2uIEkBEfE9_NHhBQ Message-ID: From: "K. Macy" To: Andre Oppermann Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:35:39 -0000 > > Yes, but the lookup requires a lock? =A0Or is every entry replicated > to every CPU? =A0So a number of concurrent CPU's sending to the same > UDP destination would content on that lock? No. In the default case it's per CPU, thus no serialization is required. But yes, if your transmitting thread manages to bounce to every core during send within the flow expiration window you'll have an extra 12 or however many bytes per peer times the number of cores. There is usually a fair amount of CPU affinity over a given unit time. --=20 =A0 =A0=93The real damage is done by those millions who want to 'get by.' The ordinary men who just want to be left in peace. Those who don=92t want their little lives disturbed by anything bigger than themselves. Those with no sides and no causes. Those who won=92t take measure of their own strength, for fear of antagonizing their own weakness. Those who don=92t like to make waves=97or enemies. =A0 =A0Those for whom freedom, honour, truth, and principles are only literature. Those who live small, love small, die small. It=92s the reductionist approach to life: if you keep it small, you=92ll keep it under control. If you don=92t make any noise, the bogeyman won=92t find you. =A0 =A0But it=92s all an illusion, because they die too, those people who roll up their spirits into tiny little balls so as to be safe. Safe?! >From what? Life is always on the edge of death; narrow streets lead to the same place as wide avenues, and a little candle burns itself out just like a flaming torch does. =A0 =A0I choose my own way to burn.=94 =A0 =A0Sophie Scholl From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:37:20 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 43031106564A; Thu, 19 Apr 2012 21:37:20 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id DA6F48FC20; Thu, 19 Apr 2012 21:37:19 +0000 (UTC) Received: by iahk25 with SMTP id k25so16244808iah.13 for ; Thu, 19 Apr 2012 14:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=VdqKexkI9NTQNRAqwjznatyBt1CkDTEafj5KZbqQggc=; b=uwHtiBV8WS0aMae4jbpF9mBLAAUwdNkpmkOJWjlvR5lV2kyf3uuj7UJUJ8I9ltH7Dq A8qt1/wcrh/h1Fk2ZG9RMXnUFqgJUapFIeeLwKY01uxcBaCJ5Vw5f0k3YBkptajEBbpx OpHeCisjpkbR2K8XEOXTt0VhmHSxMKDzNCIUYFGfCKe+PHxHzbC3DJMNooOmHgWWkMH2 cRTqTt6Uu2JQS8hY2/PftDpIk3JPfwgrCJpMzYEIM1RYBEPTZJmdTQv+Z740X5KjrMPa 3B2u57SSKQOL2mpjwj96Fi27HT+IT87gMzhjC93z9BnIYJaOGwWu5xh02eQIUZ9reMch Q52A== MIME-Version: 1.0 Received: by 10.50.194.232 with SMTP id hz8mr3966958igc.38.1334871439533; Thu, 19 Apr 2012 14:37:19 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Thu, 19 Apr 2012 14:37:19 -0700 (PDT) In-Reply-To: <4F908327.5070905@freebsd.org> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> <4F908327.5070905@freebsd.org> Date: Thu, 19 Apr 2012 23:37:19 +0200 X-Google-Sender-Auth: SYeilf-tPrYXA-B61Uy17lxh7Eg Message-ID: From: "K. Macy" To: Andre Oppermann Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:37:20 -0000 On Thu, Apr 19, 2012 at 11:27 PM, Andre Oppermann wrote= : > On 19.04.2012 23:17, K. Macy wrote: >>>> >>>> This only helps if your flows aren't hitting the same rtentry. >>>> Otherwise you still convoy on the lock for the rtentry itself to >>>> increment and decrement the rtentry's reference count. >>> >>> >>> >>> The rtentry lock isn't obtained anymore. =A0While the rmlock read >>> lock is held on the rtable the relevant information like ifp and >>> such is copied out. =A0No later referencing possible. =A0In the end >>> any referencing of an rtentry would be forbidden and the rtentry >>> lock can be removed. =A0The second step can be optional though. >> >> >> Can you point me to a tree where you've made these changes? > > > It's not in a public tree. =A0I just did a 'svn up' and the recent > pf and rtsocket changes created some conflicts. =A0Have to solve > them before posting. =A0Timeframe (early) next week. > > Ok. Keep us posted. Thanks, Kip --=20 =A0 =A0=93The real damage is done by those millions who want to 'get by.' The ordinary men who just want to be left in peace. Those who don=92t want their little lives disturbed by anything bigger than themselves. Those with no sides and no causes. Those who won=92t take measure of their own strength, for fear of antagonizing their own weakness. Those who don=92t like to make waves=97or enemies. =A0 =A0Those for whom freedom, honour, truth, and principles are only literature. Those who live small, love small, die small. It=92s the reductionist approach to life: if you keep it small, you=92ll keep it under control. If you don=92t make any noise, the bogeyman won=92t find you. =A0 =A0But it=92s all an illusion, because they die too, those people who roll up their spirits into tiny little balls so as to be safe. Safe?! >From what? Life is always on the edge of death; narrow streets lead to the same place as wide avenues, and a little candle burns itself out just like a flaming torch does. =A0 =A0I choose my own way to burn.=94 =A0 =A0Sophie Scholl From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 21:43:37 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 039961065679; Thu, 19 Apr 2012 21:43:37 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id B64F68FC12; Thu, 19 Apr 2012 21:43:36 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 0F42873027; Fri, 20 Apr 2012 00:03:08 +0200 (CEST) Date: Fri, 20 Apr 2012 00:03:08 +0200 From: Luigi Rizzo To: Andre Oppermann Message-ID: <20120419220308.GB95692@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F908180.6010408@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F908180.6010408@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 21:43:37 -0000 On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: > On 19.04.2012 22:46, Luigi Rizzo wrote: ... > >What might be moderately expensive are the critical_enter()/critical_exit() > >calls around individual allocations. > > Can't get away from those as a thread must not migrate away > when manipulating the per-CPU mbuf pool. i understand. > >The allocation happens while the code has already an exclusive > >lock on so->snd_buf so a pool of fresh buffers could be attached > >there. > > Ah, there it is not necessary to hold the snd_buf lock while > doing the allocate+copyin. With soreceive_stream() (which is it is not held in the tx path either -- but there is a short section before m_uiotombuf() which does ... SOCKBUF_LOCK(&so->so_snd); // check for pending errors, sbspace, so_state SOCKBUF_UNLOCK(&so->so_snd); ... (some of this is slightly dubious, but that's another story) > >But the other consideration is that one could defer the mbuf allocation > >to a later time when the packet is actually built (or anyways > >right before the thread returns). > >What i envision (and this would fit nicely with netmap) is the following: > >- have a (possibly readonly) template for the headers (MAC+IP+UDP) > > attached to the socket, built on demand, and cached and managed > > with similar invalidation rules as used by fastforward; > > That would require to cross-pointer the rtentry and whatnot again. i was planning to keep a copy, not a reference. If the copy becomes temporarily stale, no big deal, as long as you can detect it reasonably quiclky -- routes are not guaranteed to be correct, anyways. > >- possibly extend the pru_send interface so one can pass down the uio > > instead of the mbuf; > >- make an opportunistic buffer allocation in some place downstream, > > where the code already has an x-lock on some resource (could be > > the snd_buf, the interface, ...) so the allocation comes for free. > > ETOOCOMPLEXOVERTIME. maybe. But i want to investigate this. cheers luigi From owner-freebsd-net@FreeBSD.ORG Thu Apr 19 22:37:29 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 54ABA106566C for ; Thu, 19 Apr 2012 22:37:29 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9D7478FC19 for ; Thu, 19 Apr 2012 22:37:28 +0000 (UTC) Received: (qmail 16058 invoked from network); 19 Apr 2012 22:32:39 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 19 Apr 2012 22:32:39 -0000 Message-ID: <4F9093A1.3080305@freebsd.org> Date: Fri, 20 Apr 2012 00:37:21 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F908180.6010408@freebsd.org> <20120419220308.GB95692@onelab2.iet.unipi.it> In-Reply-To: <20120419220308.GB95692@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Apr 2012 22:37:29 -0000 On 20.04.2012 00:03, Luigi Rizzo wrote: > On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: >> On 19.04.2012 22:46, Luigi Rizzo wrote: >>> The allocation happens while the code has already an exclusive >>> lock on so->snd_buf so a pool of fresh buffers could be attached >>> there. >> >> Ah, there it is not necessary to hold the snd_buf lock while >> doing the allocate+copyin. With soreceive_stream() (which is > > it is not held in the tx path either -- but there is a short section > before m_uiotombuf() which does > > ... > SOCKBUF_LOCK(&so->so_snd); > // check for pending errors, sbspace, so_state > SOCKBUF_UNLOCK(&so->so_snd); > ... > > (some of this is slightly dubious, but that's another story) Indeed the lock isn't held across the m_uiotombuf(). You're talking about filling an sockbuf mbuf cache while holding the lock? >>> But the other consideration is that one could defer the mbuf allocation >>> to a later time when the packet is actually built (or anyways >>> right before the thread returns). >>> What i envision (and this would fit nicely with netmap) is the following: >>> - have a (possibly readonly) template for the headers (MAC+IP+UDP) >>> attached to the socket, built on demand, and cached and managed >>> with similar invalidation rules as used by fastforward; >> >> That would require to cross-pointer the rtentry and whatnot again. > > i was planning to keep a copy, not a reference. If the copy becomes > temporarily stale, no big deal, as long as you can detect it reasonably > quiclky -- routes are not guaranteed to be correct, anyways. Be wary of disappearing interface pointers... >>> - possibly extend the pru_send interface so one can pass down the uio >>> instead of the mbuf; >>> - make an opportunistic buffer allocation in some place downstream, >>> where the code already has an x-lock on some resource (could be >>> the snd_buf, the interface, ...) so the allocation comes for free. >> >> ETOOCOMPLEXOVERTIME. > > maybe. But i want to investigate this. I fail see what passing down the uio would gain you. The snd_buf lock isn't obtained again after the copyin. Not that I want to prevent you from investigating other ways. ;) -- Andre From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 06:15:59 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 64AC61065675; Fri, 20 Apr 2012 06:15:59 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id F185A8FC1A; Fri, 20 Apr 2012 06:15:58 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id BC1CE73027; Fri, 20 Apr 2012 08:35:30 +0200 (CEST) Date: Fri, 20 Apr 2012 08:35:30 +0200 From: Luigi Rizzo To: Andre Oppermann Message-ID: <20120420063530.GB233@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F908180.6010408@freebsd.org> <20120419220308.GB95692@onelab2.iet.unipi.it> <4F9093A1.3080305@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F9093A1.3080305@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 06:15:59 -0000 On Fri, Apr 20, 2012 at 12:37:21AM +0200, Andre Oppermann wrote: > On 20.04.2012 00:03, Luigi Rizzo wrote: > >On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: > >>On 19.04.2012 22:46, Luigi Rizzo wrote: > >>>The allocation happens while the code has already an exclusive > >>>lock on so->snd_buf so a pool of fresh buffers could be attached > >>>there. > >> > >>Ah, there it is not necessary to hold the snd_buf lock while > >>doing the allocate+copyin. With soreceive_stream() (which is > > > >it is not held in the tx path either -- but there is a short section > >before m_uiotombuf() which does > > > > ... > > SOCKBUF_LOCK(&so->so_snd); > > // check for pending errors, sbspace, so_state > > SOCKBUF_UNLOCK(&so->so_snd); > > ... > > > >(some of this is slightly dubious, but that's another story) > > Indeed the lock isn't held across the m_uiotombuf(). You're talking > about filling an sockbuf mbuf cache while holding the lock? all i am thinking is that when we have a serialization point we could use it for multiple related purposes. In this case yes we could keep a small mbuf cache attached to so_snd. When the cache is empty either get a new batch (say 10-20 bufs) from the zone allocator, possibly dropping and regaining the lock if the so_snd must be a leaf. Besides for protocols like TCP (does it use the same path ?) the mbufs are already there (released by incoming acks) in the steady state, so it is not even necessary to to refill the cache. This said, i am not 100% sure that the 100ns I am seeing are all spent in the zone allocator. As i said the chain of indirect calls and other ops is rather long on both acquire and release. > >>>But the other consideration is that one could defer the mbuf allocation > >>>to a later time when the packet is actually built (or anyways > >>>right before the thread returns). > >>>What i envision (and this would fit nicely with netmap) is the following: > >>>- have a (possibly readonly) template for the headers (MAC+IP+UDP) > >>> attached to the socket, built on demand, and cached and managed > >>> with similar invalidation rules as used by fastforward; > >> > >>That would require to cross-pointer the rtentry and whatnot again. > > > >i was planning to keep a copy, not a reference. If the copy becomes > >temporarily stale, no big deal, as long as you can detect it reasonably > >quiclky -- routes are not guaranteed to be correct, anyways. > > Be wary of disappearing interface pointers... (this reminds me, what prevents a route grabbed from the flowtable from disappearing and releasing the ifp reference ?) In any case, it seems better to keep a more persistent ifp reference in the socket rather than grab and release one on every single packet transmission. > >>>- possibly extend the pru_send interface so one can pass down the uio > >>> instead of the mbuf; > >>>- make an opportunistic buffer allocation in some place downstream, > >>> where the code already has an x-lock on some resource (could be > >>> the snd_buf, the interface, ...) so the allocation comes for free. > >> > >>ETOOCOMPLEXOVERTIME. > > > >maybe. But i want to investigate this. > > I fail see what passing down the uio would gain you. The snd_buf lock > isn't obtained again after the copyin. Not that I want to prevent you > from investigating other ways. ;) maybe it can open the way to other optimizations, such as reducing the number of places where you need to lock, or save some data copies, or reduce fragmentation, etc. cheers luigi From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 06:16:36 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 36E6E106567B; Fri, 20 Apr 2012 06:16:36 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 0D1C08FC18; Fri, 20 Apr 2012 06:16:36 +0000 (UTC) Received: from [38.110.160.135] (port=63513 helo=[10.0.208.182]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77) (envelope-from ) id 1SL791-0006I9-CG; Fri, 20 Apr 2012 02:16:35 -0400 From: George Neville-Neil Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 19 Apr 2012 21:44:33 -0400 Message-Id: <280454D9-E233-4010-8810-588296FA19D1@neville-neil.com> To: net@freebsd.org Mime-Version: 1.0 (Apple Message framework v1257) X-Mailer: Apple Mail (2.1257) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: "Bjoern A. Zeeb" Subject: Question about fixing udp6_input... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 06:16:36 -0000 Howdy, At the moment the prototype for udp6_input() is the following: int udp6_input(struct mbuf **mp, int *offp, int proto) and udp_input() looks like this: void udp_input(struct mbuf *m, int off) As far as I can tell we immediately change **mp to *m and *offp to off in udp6_input() and we also never use proto in the rest of the function. Is there any reason to not make udp6_input() look exactly like udp_input() ? Best, George From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 06:29:58 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA57C1065670 for ; Fri, 20 Apr 2012 06:29:58 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:130:3ffc::401:25]) by mx1.freebsd.org (Postfix) with ESMTP id 757C48FC15 for ; Fri, 20 Apr 2012 06:29:58 +0000 (UTC) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 7625A25D39FD; Fri, 20 Apr 2012 06:29:57 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id B11D9BE5399; Fri, 20 Apr 2012 06:29:56 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id j07Fu4GtNDeG; Fri, 20 Apr 2012 06:29:55 +0000 (UTC) Received: from orange-en1.sbone.de (orange-en1.sbone.de [IPv6:fde9:577b:c1a9:31:cabc:c8ff:fecf:e8e3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id A8D70BE539B; Fri, 20 Apr 2012 06:29:55 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: "Bjoern A. Zeeb" In-Reply-To: <280454D9-E233-4010-8810-588296FA19D1@neville-neil.com> Date: Fri, 20 Apr 2012 06:29:54 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <8C579D33-72F0-4E82-AD93-CA7BBA3531A8@FreeBSD.org> References: <280454D9-E233-4010-8810-588296FA19D1@neville-neil.com> To: George Neville-Neil X-Mailer: Apple Mail (2.1084) Cc: net@freebsd.org Subject: Re: Question about fixing udp6_input... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 06:29:59 -0000 On 20. Apr 2012, at 01:44 , George Neville-Neil wrote: > Howdy, >=20 > At the moment the prototype for udp6_input() is the following: >=20 > int > udp6_input(struct mbuf **mp, int *offp, int proto) >=20 > and udp_input() looks like this: >=20 > void > udp_input(struct mbuf *m, int off) >=20 > As far as I can tell we immediately change **mp to *m and *offp to off > in udp6_input() and we also never use proto in the rest of the = function. >=20 > Is there any reason to not make udp6_input() look exactly like = udp_input() ? I think the answer to this is here: http://wiki.freebsd.org/IPv6TODO#Remove_ip6protosw --=20 Bjoern A. Zeeb You have to have visions! It does not matter how good you are. It matters what good you do! From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 07:20:13 2012 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B38BB1065679 for ; Fri, 20 Apr 2012 07:20:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9FD038FC19 for ; Fri, 20 Apr 2012 07:20:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3K7KDum020581 for ; Fri, 20 Apr 2012 07:20:13 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3K7KDu9020579; Fri, 20 Apr 2012 07:20:13 GMT (envelope-from gnats) Date: Fri, 20 Apr 2012 07:20:13 GMT Message-Id: <201204200720.q3K7KDu9020579@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Martin Matuska Cc: Subject: Re: kern/155030: [igb] igb(4) DEVICE_POLLING does not work with carp(4) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Martin Matuska List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 07:20:13 -0000 The following reply was made to PR kern/155030; it has been noted by GNATS. From: Martin Matuska To: bug-followup@FreeBSD.org, mm@FreeBSD.org Cc: Subject: Re: kern/155030: [igb] igb(4) DEVICE_POLLING does not work with carp(4) Date: Fri, 20 Apr 2012 09:18:50 +0200 The problem was actually in the configuration of the igb driver. Polling works only with hw.igb.num_queues=1 - and this is also described in code comments of if_igb.c: * Legacy polling routine : if using this code you MUST be sure that * multiqueue is not defined, ie, set igb_num_queues to 1. This should be: a) added to the manpage b) the driver should not attempt polling if hw.igb.num_queues > 1 -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 08:31:08 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 240BF106564A; Fri, 20 Apr 2012 08:31:08 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id 43E1B14DEE2; Fri, 20 Apr 2012 08:31:06 +0000 (UTC) Message-ID: <4F911DCD.30001@FreeBSD.org> Date: Fri, 20 Apr 2012 12:26:53 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111117 Thunderbird/8.0 MIME-Version: 1.0 To: Andre Oppermann References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> In-Reply-To: <4F907FB4.3080400@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org, Luigi Rizzo , "K. Macy" , current@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 08:31:08 -0000 On 20.04.2012 01:12, Andre Oppermann wrote: > On 19.04.2012 22:34, K. Macy wrote: >>>> This is indeed a big problem. I'm working (rough edges remain) on >>>> changing the routing table locking to an rmlock (read-mostly) which >>> >> >> This only helps if your flows aren't hitting the same rtentry. >> Otherwise you still convoy on the lock for the rtentry itself to >> increment and decrement the rtentry's reference count. > > The rtentry lock isn't obtained anymore. While the rmlock read > lock is held on the rtable the relevant information like ifp and > such is copied out. No later referencing possible. In the end > any referencing of an rtentry would be forbidden and the rtentry > lock can be removed. The second step can be optional though. > >>> i was wondering, is there a way (and/or any advantage) to use the >>> fastforward code to look up the route for locally sourced packets ? >>> >> >> If the number of peers is bounded then you can use the flowtable. Max >> PPS is much higher bypassing routing lookup. However, it doesn't scale From my experience, turning fastfwd on gives ~20-30% performance increase (10G forwarding with firewalling, 1.4MPPS). ip_forward() uses 2 lookups (ip_rtaddr + ip_output) vs 1 ip_fastfwd(). The worst current problem IMHO is number of locks packet have to traverse, not number of lookups. >> to arbitrary flow numbers. > > In theory a rmlock-only lookup into a default-route only routing > table would be faster than creating a flow table entry for every > destination. It a matter of churn though. The flowtable isn't > lockless in itself, is it? > -- WBR, Alexander From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 08:57:47 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E18C9106566B; Fri, 20 Apr 2012 08:57:47 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7481A8FC08; Fri, 20 Apr 2012 08:57:47 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=dhcp170-36-red.yandex.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1SL9f7-0004Sc-OK; Fri, 20 Apr 2012 12:57:53 +0400 Message-ID: <4F91240C.3050703@ipfw.ru> Date: Fri, 20 Apr 2012 12:53:32 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111117 Thunderbird/8.0 MIME-Version: 1.0 To: Adrian Chadd References: <201204060653.q366rwLa096182@svn.freebsd.org> <4F7E9413.20602@FreeBSD.org> <4F8BBD4E.1040106@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, freebsd-net@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r233937 - in head/sys: kern net security/mac X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 08:57:48 -0000 On 17.04.2012 01:29, Adrian Chadd wrote: > On 15 April 2012 23:33, Alexander V. Chernikov wrote: >> On 16.04.2012 01:17, Adrian Chadd wrote: >>> >>> Hi, >>> >>> This has broken (at least) net80211 and bpf, with LOR: >> >> Yes, it is. Please try the attached patch > > Hi, Hello! Sorry for the late reply, answering for both letters. > > This seems like a very, very complicated diff. > > * You've removed BPF_LOCK_ASSERT() inside bpf_detachd_locked() - why'd > you do that? > * You removed a comment ("We're already protected by the global lock") > which is still relevant/valid Both should be added back, thanks. > * There are lots of modifications to the read/write locks here - I'm > not sure whether they're at all relevant to my immediate problem and > may belong in separate commits Most of the patch is not directly relevant to the problem. It solves several new problems and a bunch of very old bugs due to lack of locking. > > Is there a document somewhere which describes what the "new" style BPF > locking should be? Are there any other places (except src) where such documentation should reside? > > I "just" added BPF_LOCK() / BPF_UNLOCK() around all the calls to > bpf_detachd() which weren't locked (there were a few.) Unfortunately, this is not enough. There is possibility that bpf_setif() is called immediately before rw_destroy() in bpfdetach(). For example, you can easily trigger panic on any 8/9/current SMP system with 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done' There is also possible use-after-free for bpfif structure (since we're freeing it _before_ interface routes are cleaned up). This is why delayed free is needed. > > One final question - should the BPF global lock be recursive? It seems it really should be recursive now. > > thanks, > > > > Adrian > From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 09:00:16 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CDEB2106566C for ; Fri, 20 Apr 2012 09:00:16 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9D6338FC21 for ; Fri, 20 Apr 2012 09:00:15 +0000 (UTC) Received: (qmail 18337 invoked from network); 20 Apr 2012 08:55:21 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 20 Apr 2012 08:55:21 -0000 Message-ID: <4F9125CF.8090201@freebsd.org> Date: Fri, 20 Apr 2012 11:01:03 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: "Alexander V. Chernikov" References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F907FB4.3080400@freebsd.org> <4F911DCD.30001@FreeBSD.org> In-Reply-To: <4F911DCD.30001@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, Luigi Rizzo , "K. Macy" , net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 09:00:16 -0000 On 20.04.2012 10:26, Alexander V. Chernikov wrote: > On 20.04.2012 01:12, Andre Oppermann wrote: >> On 19.04.2012 22:34, K. Macy wrote: >>> If the number of peers is bounded then you can use the flowtable. Max >>> PPS is much higher bypassing routing lookup. However, it doesn't scale > > From my experience, turning fastfwd on gives ~20-30% performance > increase (10G forwarding with firewalling, 1.4MPPS). ip_forward() uses 2 > lookups (ip_rtaddr + ip_output) vs 1 ip_fastfwd(). Another difference is the packet copy the normal forwarding path does to be able to send a ICMP redirect message if the packet is forwarded to a different gateway on the same LAN. fastforward doesn't do that. > The worst current problem IMHO is number of locks packet have to > traverse, not number of lookups. Agreed. Actually the locking in itself is not the problem. It's the side effects of cache line dirtying/bouncing and contention. However in the great majority of the cases the data protected by the lock is only read, not modified making a 'full' lock expensive. -- Andre From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 09:25:13 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A6C83106566C for ; Fri, 20 Apr 2012 09:25:13 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1B7B28FC17 for ; Fri, 20 Apr 2012 09:25:12 +0000 (UTC) Received: (qmail 18425 invoked from network); 20 Apr 2012 09:20:19 -0000 Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 20 Apr 2012 09:20:19 -0000 Message-ID: <4F912BA9.3060508@freebsd.org> Date: Fri, 20 Apr 2012 11:26:01 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <4F908180.6010408@freebsd.org> <20120419220308.GB95692@onelab2.iet.unipi.it> <4F9093A1.3080305@freebsd.org> <20120420063530.GB233@onelab2.iet.unipi.it> In-Reply-To: <20120420063530.GB233@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 09:25:13 -0000 On 20.04.2012 08:35, Luigi Rizzo wrote: > On Fri, Apr 20, 2012 at 12:37:21AM +0200, Andre Oppermann wrote: >> On 20.04.2012 00:03, Luigi Rizzo wrote: >>> On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: >>>> On 19.04.2012 22:46, Luigi Rizzo wrote: >>>>> The allocation happens while the code has already an exclusive >>>>> lock on so->snd_buf so a pool of fresh buffers could be attached >>>>> there. >>>> >>>> Ah, there it is not necessary to hold the snd_buf lock while >>>> doing the allocate+copyin. With soreceive_stream() (which is >>> >>> it is not held in the tx path either -- but there is a short section >>> before m_uiotombuf() which does >>> >>> ... >>> SOCKBUF_LOCK(&so->so_snd); >>> // check for pending errors, sbspace, so_state >>> SOCKBUF_UNLOCK(&so->so_snd); >>> ... >>> >>> (some of this is slightly dubious, but that's another story) >> >> Indeed the lock isn't held across the m_uiotombuf(). You're talking >> about filling an sockbuf mbuf cache while holding the lock? > > all i am thinking is that when we have a serialization point we > could use it for multiple related purposes. In this case yes we > could keep a small mbuf cache attached to so_snd. When the cache > is empty either get a new batch (say 10-20 bufs) from the zone > allocator, possibly dropping and regaining the lock if the so_snd > must be a leaf. Besides for protocols like TCP (does it use the > same path ?) the mbufs are already there (released by incoming acks) > in the steady state, so it is not even necessary to to refill the > cache. I'm sure things can be tuned towards particular cases but almost always that some at the expense of versatility. I was looking at netmap for a project. It's great when there is one thing being done by one process at great speed. However as soon as I have to dispatch certain packets somewhere else for further processing, in another process, things quickly become complicated and fall apart. It would have meant to replicate what the kernel does with protosw & friends in userspace coated with IPC. No to mention re-inventing the socket layer abstraction again. So netmap is fantastic for simple, bulk and repetitive tasks with little variance. Things like packet routing, bridging, encapsulation, perhaps inspection and acting as a traffic sink/source. There are plenty of use cases for that. Coming back to your UDP test case, while the 'hacks' you propose may benefit the bulk sending of a bound socket it may not help or pessimize the DNS server case where a large number of packets is send to a large number of destinations. The layering abstractions we have in BSD are excellent and have served us quite well so far. Adding new protocols is a simple task and so on. Of course it has some trade-offs by having some indirections and not being bare-metal fast. Yes, there is a lot of potential in optimizing the locking strategies we currently have within the BSD network stack layering. Your profiling work is immensely helpful in identifying where to aim at. Once that is fixed we should stop there. Anyone who needs a particular as close as possible to the bare metal UDP packet blaster should fork the tree and do their own short-cuts and whatnot. But FreeBSD should stay a reasonable general purpose. It won't be a Ferrari, but an Audi S6 is a damn nice car as well and it can carry your whole family. :) > This said, i am not 100% sure that the 100ns I am seeing are all > spent in the zone allocator. As i said the chain of indirect calls > and other ops is rather long on both acquire and release. > >>>>> But the other consideration is that one could defer the mbuf allocation >>>>> to a later time when the packet is actually built (or anyways >>>>> right before the thread returns). >>>>> What i envision (and this would fit nicely with netmap) is the following: >>>>> - have a (possibly readonly) template for the headers (MAC+IP+UDP) >>>>> attached to the socket, built on demand, and cached and managed >>>>> with similar invalidation rules as used by fastforward; >>>> >>>> That would require to cross-pointer the rtentry and whatnot again. >>> >>> i was planning to keep a copy, not a reference. If the copy becomes >>> temporarily stale, no big deal, as long as you can detect it reasonably >>> quiclky -- routes are not guaranteed to be correct, anyways. >> >> Be wary of disappearing interface pointers... > > (this reminds me, what prevents a route grabbed from the flowtable > from disappearing and releasing the ifp reference ?) It has to keep a refcounted reference to the rtentry. > In any case, it seems better to keep a more persistent ifp reference > in the socket rather than grab and release one on every single > packet transmission. The socket doesn't and shouldn't know anything about ifp's. >>>>> - possibly extend the pru_send interface so one can pass down the uio >>>>> instead of the mbuf; >>>>> - make an opportunistic buffer allocation in some place downstream, >>>>> where the code already has an x-lock on some resource (could be >>>>> the snd_buf, the interface, ...) so the allocation comes for free. >>>> >>>> ETOOCOMPLEXOVERTIME. >>> >>> maybe. But i want to investigate this. >> >> I fail see what passing down the uio would gain you. The snd_buf lock >> isn't obtained again after the copyin. Not that I want to prevent you >> from investigating other ways. ;) > > maybe it can open the way to other optimizations, such as reducing > the number of places where you need to lock, or save some data > copies, or reduce fragmentation, etc. I appreciate your profiling work very much and try my best to help you to minimize the contention points. I hope the rtable locking changes will solve one of the biggest choke points. -- Andre From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 12:48:06 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F16621065670; Fri, 20 Apr 2012 12:48:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C5CAA8FC0C; Fri, 20 Apr 2012 12:48:06 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 39D58B924; Fri, 20 Apr 2012 08:48:06 -0400 (EDT) From: John Baldwin To: freebsd-net@freebsd.org Date: Fri, 20 Apr 2012 08:11:44 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> In-Reply-To: <20120419204622.GA94904@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201204200811.44957.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 20 Apr 2012 08:48:06 -0400 (EDT) Cc: Andre Oppermann , Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 12:48:07 -0000 On Thursday, April 19, 2012 4:46:22 pm Luigi Rizzo wrote: > What might be moderately expensive are the critical_enter()/critical_exit() > calls around individual allocations. > The allocation happens while the code has already an exclusive > lock on so->snd_buf so a pool of fresh buffers could be attached > there. Keep in mind that in the common case critical_enter() and critical_exit() should be very cheap as they should just do td->td_critnest++ and td->td_critnest--. critical_enter() should probably be inlined if KTR is not enabled. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 12:48:06 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F16621065670; Fri, 20 Apr 2012 12:48:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C5CAA8FC0C; Fri, 20 Apr 2012 12:48:06 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 39D58B924; Fri, 20 Apr 2012 08:48:06 -0400 (EDT) From: John Baldwin To: freebsd-net@freebsd.org Date: Fri, 20 Apr 2012 08:11:44 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> In-Reply-To: <20120419204622.GA94904@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201204200811.44957.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 20 Apr 2012 08:48:06 -0400 (EDT) Cc: Andre Oppermann , Luigi Rizzo , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 12:48:07 -0000 On Thursday, April 19, 2012 4:46:22 pm Luigi Rizzo wrote: > What might be moderately expensive are the critical_enter()/critical_exit() > calls around individual allocations. > The allocation happens while the code has already an exclusive > lock on so->snd_buf so a pool of fresh buffers could be attached > there. Keep in mind that in the common case critical_enter() and critical_exit() should be very cheap as they should just do td->td_critnest++ and td->td_critnest--. critical_enter() should probably be inlined if KTR is not enabled. -- John Baldwin From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 12:53:25 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A87B9106566C for ; Fri, 20 Apr 2012 12:53:25 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 5714E8FC17 for ; Fri, 20 Apr 2012 12:53:25 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 6FBDACC99A1; Fri, 20 Apr 2012 14:53:18 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 18.9094] X-CRM114-CacheID: sfid-20120420_14531_BFA48C09 X-CRM114-Status: Good ( pR: 18.9094 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Fri Apr 20 14:53:18 2012 X-DSPAM-Confidence: 0.9967 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4f915c3e539961563918857 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, FreeBSD, 0.00047, FreeBSD, 0.00047, >+On, 0.00080, >+Hi, 0.00150, wrote+>, 0.00178, wrote+>>, 0.00197, >+I, 0.00228, Hi+>, 0.00237, >+>, 0.00294, >+>, 0.00294, I+>, 0.00316, string, 0.00356, string, 0.00356, References*mail.gmail.com>, 0.00362, On+Thu, 0.00379, In-Reply-To*mail.gmail.com>, 0.00386, >>+>>, 0.00449, socket, 0.00474, socket, 0.00474, the+patch, 0.00474, wrote, 0.00507, wrote, 0.00507, References*fsn.hu>, 0.00517, >>+I, 0.00517, supported, 0.00541, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 08331CC9993; Fri, 20 Apr 2012 14:53:18 +0200 (CEST) Message-ID: <4F915C3D.5070908@fsn.hu> Date: Fri, 20 Apr 2012 14:53:17 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Svatopluk Kraus References: <4F8FA591.4010503@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: SO_BINDTODEVICE or equivalent? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 12:53:25 -0000 Hi, Never heard of it, thanks! On 04/19/12 11:32, Svatopluk Kraus wrote: > Hi, > > Use IP_RECVIF option. > > For IP_SENDIF look at > http://lists.freebsd.org/pipermail/freebsd-net/2007-March/013510.html > I used the patch on my embedded FreeBSD 9.0 boxes and it works fine. I > modificated it slightly to match 9.0. > > Svata > > On Thu, Apr 19, 2012 at 7:41 AM, Attila Nagy wrote: >> Hi, >> I want to solve the classic problem of a DHCP server: listening for >> broadcast UDP packets and figuring out what interface a packet has >> come in. >> The Linux solution is SO_BINDTODEVICE, which according to socket(7): >> SO_BINDTODEVICE >> Bind this socket to a particular device like "eth0", as >> specified in the passed interface name. If the name is an empty >> string or the option length is zero, the socket device binding >> is removed. The passed option is a variable-length >> null-terminated interface name string with the maximum size of >> IFNAMSIZ. If a socket is bound to an interface, only packets >> received from that particular interface are processed by the >> socket. Note that this only works for some socket types, >> particularly AF_INET sockets. It is not supported for packet >> sockets (use normal [1]bind(2) there). >> >> This makes it possible to listen on selected interfaces for >> (broadcast) packets. FreeBSD currently doesn't implement this feature From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 14:24:44 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30632106564A; Fri, 20 Apr 2012 14:24:44 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id DB0C68FC14; Fri, 20 Apr 2012 14:24:43 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 30D397300B; Fri, 20 Apr 2012 16:44:10 +0200 (CEST) Date: Fri, 20 Apr 2012 16:44:10 +0200 From: Luigi Rizzo To: "K. Macy" Message-ID: <20120420144410.GA3629@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <20120419212224.GA95459@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Andre Oppermann , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 14:24:44 -0000 On Thu, Apr 19, 2012 at 11:06:38PM +0200, K. Macy wrote: > On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo wrote: > > On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: > >> >> This is indeed a big problem. ?I'm working (rough edges remain) on > >> >> changing the routing table locking to an rmlock (read-mostly) which > >> > > >> > >> This only helps if your flows aren't hitting the same rtentry. > >> Otherwise you still convoy on the lock for the rtentry itself to > >> increment and decrement the rtentry's reference count. > >> > >> > i was wondering, is there a way (and/or any advantage) to use the > >> > fastforward code to look up the route for locally sourced packets ? > > > > actually, now that i look at the code, both ip_output() and > > the ip_fastforward code use the same in_rtalloc_ign(...) > > > >> > > >> > >> If the number of peers is bounded then you can use the flowtable. Max > >> PPS is much higher bypassing routing lookup. However, it doesn't scale > >> to arbitrary flow numbers. > > > > re. flowtable, could you point me to what i should do instead of > > calling in_rtalloc_ign() ? > > If you build with it in your kernel config and enable the sysctl > ip_output will automatically use it for TCP and UDP connections. If > you're doing forwarding you'll need to patch the forwarding path. cool. For the records, with "netsend 10.0.0.2 ports 18 0 5" on an ixgbe talking to a remote host i get the following results (with a single port netsend does a connect() and then send(), otherwise it loops around a sendto() ) net.flowtable.enabled port ns/pkt ----------------------------------------------------- not compiled in 5000 944 M_FLOWID not set 0 (disable) 5000 1004 1 (enable) 5000 980 not compiled in 5000-5001 3400 M_FLOWID not set 0 (disable) 5000-5001 1418 1 (enable) 5000-5001 1230 The small penalty when flowtable is disabled but compiled in is probably because the net.flowtable.enable flag is checked a bit deep in the code. The advantage with non-connect()ed sockets is huge. I don't quite understand why disabling the flowtable still helps there. cheers luigi From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 16:29:11 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 335B9106564A; Fri, 20 Apr 2012 16:29:11 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id CED138FC17; Fri, 20 Apr 2012 16:29:10 +0000 (UTC) Received: by iahk25 with SMTP id k25so17737628iah.13 for ; Fri, 20 Apr 2012 09:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=sxHsNY9fe/y2isJ4u36g48rSPAaRVuqxwsJ0V1iY79g=; b=rMAwlf7o1xNFIWUvi/CHyuO8SVG0QF9GRQ0FLeb0JoffVEz2lB781w+2ZqrSZBOr1b kPMFaHvmEq7COCOviKYgpCFCPjndez0g0pglBZj1widzD5icGxmCk3QV3FoFDthLrPVK kD9rq7jUccxbOLHC82roHT25H+D/dcKAnc47yeWaurX06MEPFHvSFBEi3mTdSQ1dOuQC i0sntZn7aCLSsbtUjELFcnaLDLxfnZkd8jMRkMqOzE+vzoZNnGZRtth65d2rH8iYO39W NO2vm+6nVQasi9gwkHUX3X/5pXXRuwhWPqKRB86CNQ91JY30JaiqXGtuIn8f3aTul9IR JYyw== MIME-Version: 1.0 Received: by 10.50.51.197 with SMTP id m5mr6994884igo.38.1334939350486; Fri, 20 Apr 2012 09:29:10 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.50.129.39 with HTTP; Fri, 20 Apr 2012 09:29:10 -0700 (PDT) In-Reply-To: <20120420144410.GA3629@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <20120419212224.GA95459@onelab2.iet.unipi.it> <20120420144410.GA3629@onelab2.iet.unipi.it> Date: Fri, 20 Apr 2012 18:29:10 +0200 X-Google-Sender-Auth: C9O6HmPT-jO94KK-wu9iOn5WWz0 Message-ID: From: "K. Macy" To: Luigi Rizzo Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Andre Oppermann , current@freebsd.org, net@freebsd.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 16:29:11 -0000 Comments inline below: On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo wrote: > On Thu, Apr 19, 2012 at 11:06:38PM +0200, K. Macy wrote: >> On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo wrote= : >> > On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: >> >> >> This is indeed a big problem. ?I'm working (rough edges remain) on >> >> >> changing the routing table locking to an rmlock (read-mostly) whic= h >> >> > >> >> >> >> This only helps if your flows aren't hitting the same rtentry. >> >> Otherwise you still convoy on the lock for the rtentry itself to >> >> increment and decrement the rtentry's reference count. >> >> >> >> > i was wondering, is there a way (and/or any advantage) to use the >> >> > fastforward code to look up the route for locally sourced packets ? >> > >> > actually, now that i look at the code, both ip_output() and >> > the ip_fastforward code use the same in_rtalloc_ign(...) >> > >> >> > >> >> >> >> If the number of peers is bounded then you can use the flowtable. Max >> >> PPS is much higher bypassing routing lookup. However, it doesn't scal= e >> >> to arbitrary flow numbers. >> > >> > re. flowtable, could you point me to what i should do instead of >> > calling in_rtalloc_ign() ? >> >> If you build with it in your kernel config and enable the sysctl >> ip_output will automatically use it for TCP and UDP connections. If >> you're doing forwarding you'll need to patch the forwarding path. > > cool. > For the records, with "netsend 10.0.0.2 ports 18 0 5" on an ixgbe > talking to a remote host i get the following results (with a single > port netsend does a connect() and then send(), otherwise it > loops around a sendto() ) > Sorry, 5000 vs 5000-5001 means 1 vs 2 streams? Does this mean for a single socket the overhead is less without it compiled in than with it compiled in but enabled? That is certainly different from what I see with TCP where I see a 30% increase in aggregate throughput the last time I tried this (on IPoIB). For the record the M_FLOWID is used to pick the transmit queue so with multiple streams you're best of setting it if your device has more than one hardware device queue. > =A0 =A0 =A0 =A0net.flowtable.enabled =A0 port =A0 =A0 =A0 =A0 =A0 =A0ns/p= kt > =A0 =A0 =A0 =A0----------------------------------------------------- > =A0 =A0 =A0 =A0not compiled in =A0 =A0 =A0 =A0 5000 =A0 =A0 =A0 =A0 =A0 = =A0 944 =A0 =A0M_FLOWID not set > =A0 =A0 =A0 =A00 (disable) =A0 =A0 =A0 =A0 =A0 =A0 5000 =A0 =A0 =A0 =A0 = =A0 =A01004 > =A0 =A0 =A0 =A01 (enable) =A0 =A0 =A0 =A0 =A0 =A0 =A05000 =A0 =A0 =A0 =A0= =A0 =A0 980 > > =A0 =A0 =A0 =A0not compiled in =A0 =A0 =A0 =A0 5000-5001 =A0 =A0 =A0 3400= =A0 =A0M_FLOWID not set > =A0 =A0 =A0 =A00 (disable) =A0 =A0 =A0 =A0 =A0 =A0 5000-5001 =A0 =A0 =A0 = 1418 > =A0 =A0 =A0 =A01 (enable) =A0 =A0 =A0 =A0 =A0 =A0 =A05000-5001 =A0 =A0 = =A0 1230 > > The small penalty when flowtable is disabled but compiled in is > probably because the net.flowtable.enable flag is checked > a bit deep in the code. > > The advantage with non-connect()ed sockets is huge. I don't > quite understand why disabling the flowtable still helps there. Do you mean having it compiled in but disabled still helps performance? Yes, that is extremely strange. -Kip --=20 =A0 =A0=93The real damage is done by those millions who want to 'get by.' The ordinary men who just want to be left in peace. Those who don=92t want their little lives disturbed by anything bigger than themselves. Those with no sides and no causes. Those who won=92t take measure of their own strength, for fear of antagonizing their own weakness. Those who don=92t like to make waves=97or enemies. =A0 =A0Those for whom freedom, honour, truth, and principles are only literature. Those who live small, love small, die small. It=92s the reductionist approach to life: if you keep it small, you=92ll keep it under control. If you don=92t make any noise, the bogeyman won=92t find you. =A0 =A0But it=92s all an illusion, because they die too, those people who roll up their spirits into tiny little balls so as to be safe. Safe?! >From what? Life is always on the edge of death; narrow streets lead to the same place as wide avenues, and a little candle burns itself out just like a flaming torch does. =A0 =A0I choose my own way to burn.=94 =A0 =A0Sophie Scholl From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 18:43:40 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6635106566B; Fri, 20 Apr 2012 18:43:40 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 71C7D8FC08; Fri, 20 Apr 2012 18:43:37 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 57D2E7300A; Fri, 20 Apr 2012 21:03:09 +0200 (CEST) Date: Fri, 20 Apr 2012 21:03:09 +0200 From: Luigi Rizzo To: net@freebsd.org, current@freebsd.org Message-ID: <20120420190309.GA5617@onelab2.iet.unipi.it> References: <20120419133018.GA91364@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120419133018.GA91364@onelab2.iet.unipi.it> User-Agent: Mutt/1.4.2.3i Cc: Subject: more network performance info: ether_output() X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 18:43:40 -0000 Continuing my profiling on network performance, another place were we waste a lot of time is if_ethersubr.c::ether_output() In particular, from the beginning of ether_output() to the final call to ether_output_frame() the code takes slightly more than 210ns on my i7-870 CPU running at 2.93 GHz + TurboBoost. In particular: - the route does not have a MAC address (lle) attached, which causes arpresolve() to be called all the times. This consumes about 100ns. It happens also with locally sourced TCP. Using the flowtable cuts this time down to about 30-40ns - another 100ns is spend to copy the MAC header into the mbuf, and then check whether a local copy should be looped back. Unfortunately the code here is a bit convoluted so the header fields are copied twice, and using memcpy on the individual pieces. Note that all the above happens not just with my udp flooding tests, but also with regular TCP traffic. cheers luigi From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 18:55:05 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2254F1065670 for ; Fri, 20 Apr 2012 18:55:05 +0000 (UTC) (envelope-from dmk.sbor@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id C14978FC08 for ; Fri, 20 Apr 2012 18:55:04 +0000 (UTC) Received: by yhgm50 with SMTP id m50so6466242yhg.13 for ; Fri, 20 Apr 2012 11:55:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=kWRZoQSMQV9qrAmcTL5b0ga6SKQY+WGnsluwXR7HeM8=; b=c3v0phB26cbd/HOaeib6fPIaoVxHz5MWkyYkMIByVLFmuqSRkue8Flpn3cs3ZgkHjg jrPrAdKVRZKsoiSc7mo+yZZ17PU9JIXAIdv7rUMYGk9aR2teE0Rs2OVQeJ6zcqjNI4LS izU4gOF0nrivjxCUpftJlxBCX1aT+GpxZbdW8b7PZpSvhJSYoHBCy9ctDxYGSHaXH/UA j5EoFelOcFjmePWUEUZl+Yr8uuwhxJkLrh9ALXLkaW86Jqjx4e29mDzkCL9AK5jc7Y1y Rq24pdgno083JEl6IMGAp3y/pESlYoTy2c8/z+BPn9gd4Tmh3/uOUUFOvgLy84y/OItx Gfow== MIME-Version: 1.0 Received: by 10.236.73.169 with SMTP id v29mr7000943yhd.12.1334948103389; Fri, 20 Apr 2012 11:55:03 -0700 (PDT) Received: by 10.146.168.1 with HTTP; Fri, 20 Apr 2012 11:55:03 -0700 (PDT) In-Reply-To: References: Date: Fri, 20 Apr 2012 22:55:03 +0400 Message-ID: From: "Dmitry S. Kasterin" To: Kevin Oberman Content-Type: text/plain; charset=UTF-8 Cc: freebsd-net@freebsd.org, Michael Sierchio Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 18:55:05 -0000 > Thank you for the "allow tcp from me to any established" rule, > I'll give it a try later. Ok, I've tested this - no oddity/"frozen" connection. As expected. This is an excerpt from the ruleset (ipfw show): 00101 4759 2588637 allow tcp from any to any established 00102 206 12360 allow tcp from me to any setup 00777 0 0 deny log logamount 16 ip from any to any > I didn't change anything. Quite possible dyn_fin_lifetime is too > small. I'll try to raise it. # sysctl net.inet.ip.fw.dyn_fin_lifetime=4 net.inet.ip.fw.dyn_fin_lifetime: 1 -> 4 # sysctl net.inet.ip.fw.dyn_rst_lifetime=4 net.inet.ip.fw.dyn_rst_lifetime: 1 -> 4 The situation is better, but I am still having troubles with "heavy" sites (images, JS an so on; for example - http://cnx.org/content/m16336/latest/ ). And still I can see odd packets from "deny log all from any to any" rule: 15:09:58.654613 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq 3948689318, ack 1903284725, ... 15:09:59.158612 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq 0, ack 1, ... 15:09:59.222114 IP 213.180.193.14.80 > w.x.y.z.11215: Flags [F.], seq 1, ack 0, ... 15:09:59.966611 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq 0, ack 1, ... 15:51:43.244361 IP 128.42.169.34.80 > w.x.y.z.13876: Flags [F.], seq 3534903525, ack 108808080, ... 15:51:49.418317 IP 128.42.169.34.80 > w.x.y.z.13876: Flags [F.], seq 0, ack 1, ... 15:58:47.664606 IP w.x.y.z.32748 > 195.91.160.36.80: Flags [F.], seq 3277652538, ack 2683877393, ... 15:58:49.106924 IP 195.91.160.36.80 > w.x.y.z.32748: Flags [F.], seq 1, ack 0, ... From owner-freebsd-net@FreeBSD.ORG Fri Apr 20 23:13:09 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D608106566B for ; Fri, 20 Apr 2012 23:13:09 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id CA9848FC0A for ; Fri, 20 Apr 2012 23:13:08 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so9778744wgb.31 for ; Fri, 20 Apr 2012 16:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=AMwadx4rU8hAJ4C/IIzd12ysdyGphpLbvjP02evuieU=; b=UY5FdVD4rcBclhSsBt+Yx8Dw1Xf3bnAAh6YYcI8eIuSyAj6vEWQwWN+k/LfnHW3ynS mVLWk/f0eOOq54nv9zIuRly7MjVA2d/sXU2knuWWs7dEOYejq3to7sxnwppUL356NuEe EaySRHjnxRIMn/Ja/ij1ZFXza7ho/moC+sgBv8HZiOZ1vVHNhIJErhrOf+2mqSDxNuUz tfMI2fWJD4aC/UqdMXVnrUMlnxkwE8+eNHEquQOQ8AmcVR2DKh4sY/9wCDymf79X4+y3 7zQFb7iRZBq1sb4CUbm6v5YxmvmZPFUutwAzv9PFatnQ0n2ry/uXdN9yWDSVSMBFRQAt dQcA== MIME-Version: 1.0 Received: by 10.180.77.233 with SMTP id v9mr1548121wiw.22.1334963587956; Fri, 20 Apr 2012 16:13:07 -0700 (PDT) Received: by 10.223.54.207 with HTTP; Fri, 20 Apr 2012 16:13:07 -0700 (PDT) In-Reply-To: References: Date: Fri, 20 Apr 2012 16:13:07 -0700 Message-ID: From: Kevin Oberman To: "Dmitry S. Kasterin" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, Michael Sierchio Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 23:13:09 -0000 On Fri, Apr 20, 2012 at 11:55 AM, Dmitry S. Kasterin w= rote: >> Thank you for the "allow tcp from me to any established" rule, >> I'll give it a try later. > > Ok, I've tested this - no oddity/"frozen" connection. =A0As expected. > This is an excerpt from the ruleset (ipfw show): > > 00101 =A04759 =A02588637 allow tcp from any to any established > 00102 =A0 206 =A0 =A012360 allow tcp from me to any setup > > 00777 =A0 =A0 0 =A0 =A0 =A0 =A00 deny log logamount 16 ip from any to any When you use 'established', you are depending on TCP to maintain state, which it does all the time. There were some attacks involving sequence number "guessing" which were once not really randomized, but, at least on FreeBSD and most current systems, these are now generated by a good random number generator and are essentially impossible to guess. I have not heard of any use of this attack for several years and then on systems with broken PRNGs. I think the problem probably was fixed over 5 years ago. >> I didn't change anything. Quite possible dyn_fin_lifetime is too >> small. I'll try to raise it. > > # sysctl net.inet.ip.fw.dyn_fin_lifetime=3D4 > net.inet.ip.fw.dyn_fin_lifetime: 1 -> 4 > # sysctl net.inet.ip.fw.dyn_rst_lifetime=3D4 > net.inet.ip.fw.dyn_rst_lifetime: 1 -> 4 > > The situation is better, but I am still having troubles with "heavy" > sites (images, JS an so on; for example =A0- > http://cnx.org/content/m16336/latest/ ). > And still I can see odd packets from "deny log all from any to any" rule: > > 15:09:58.654613 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq > 3948689318, ack 1903284725, ... > 15:09:59.158612 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq > 0, ack 1, ... > 15:09:59.222114 IP 213.180.193.14.80 > w.x.y.z.11215: Flags [F.], seq > 1, ack 0, ... > 15:09:59.966611 IP w.x.y.z.11215 > 213.180.193.14.80: Flags [F.], seq > 0, ack 1, ... > > 15:51:43.244361 IP 128.42.169.34.80 > w.x.y.z.13876: Flags [F.], seq > 3534903525, ack 108808080, ... > 15:51:49.418317 IP 128.42.169.34.80 > w.x.y.z.13876: Flags [F.], seq > 0, ack 1, ... > > 15:58:47.664606 IP w.x.y.z.32748 > 195.91.160.36.80: Flags [F.], seq > 3277652538, ack 2683877393, ... > 15:58:49.106924 IP 195.91.160.36.80 > w.x.y.z.32748: Flags [F.], seq > 1, ack 0, ... The thing that jumps out is that all of the blocked packets are of FIN packets. I am not sure why they are being denied as they have FIN+ACK and that should meet the requirements for 'established". Are you seeing a large number of TCP sessions in partially closed states? I don't recall if you mentioned it, but what version of FreeBSD are you running? If you have not dine so, I urge you to read the firewall(7) man page. It discusses firewall design and implementation with IPFW. Also, if you choose to use stateful TCP filtering, it is probably best to do it in the manner shown in the ipfw(8) man page under DYNAMIC RULES. This is very different from the way you did it. --=20 R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 06:34:23 2012 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8741E106566C; Sat, 21 Apr 2012 06:34:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 083CE8FC08; Sat, 21 Apr 2012 06:34:19 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q3L6Y84x025316 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 21 Apr 2012 16:34:09 +1000 Date: Sat, 21 Apr 2012 16:34:08 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "K. Macy" In-Reply-To: Message-ID: <20120421155638.E982@besplex.bde.org> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <20120419212224.GA95459@onelab2.iet.unipi.it> <20120420144410.GA3629@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Andre Oppermann , Luigi Rizzo , current@FreeBSD.org, net@FreeBSD.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 06:34:23 -0000 On Fri, 20 Apr 2012, K. Macy wrote: > On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo wrote: >> The small penalty when flowtable is disabled but compiled in is >> probably because the net.flowtable.enable flag is checked >> a bit deep in the code. >> >> The advantage with non-connect()ed sockets is huge. I don't >> quite understand why disabling the flowtable still helps there. > > Do you mean having it compiled in but disabled still helps > performance? Yes, that is extremely strange. This reminds me that when I worked on this, I saw very large throughput differences (in the 20-50% range) as a result of minor changes in unrelated code. I could get these changes intentionally by adding or removing padding in unrelated unused text space, so the differences were apparently related to text alignment. I thought I had some significant micro-optimizations, but it turned out that they were acting mainly by changing the layout in related used text space where it is harder to control. Later, I suspected that the differences were more due to cache misses for data than for text. The CPU and its caching must affect this significantly. I tested on an AthlonXP and Athlon64, and the differences were larger on the AthlonXP. Both of these have a shared I/D cache so pressure on the I part would affect the D part, but in this benchmark the D part is much more active than the I part so it is unclear how text layout could have such a large effect. Anyway, the large differences made it impossible to trust the results of benchmarking any single micro-benchmark. Also, ministat is useless for understanding the results. (I note that luigi didn't provide any standard deviations and neither would I. :-). My results depended on the cache behaviour but didn't change significantly when rerun, unless the code was changed. Bruce From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 11:41:37 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0A1C106566C for ; Sat, 21 Apr 2012 11:41:37 +0000 (UTC) (envelope-from dmk.sbor@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 742CA8FC0C for ; Sat, 21 Apr 2012 11:41:37 +0000 (UTC) Received: by ghrr20 with SMTP id r20so6731169ghr.13 for ; Sat, 21 Apr 2012 04:41:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=KSiEbTRTk2v1kmBG39n3xjD4U7Qk1lxK9M5KjBtEmg8=; b=mFqr7218iUnL9ehTF6FsZdPiY+py8M1Km5bttvedxrQIp+lMwntxraG+N1f4P793SD 6sItxYl27OgmjL4HZ7d91lXYIG6SgF3rMeXuDlcusRol+InbK9eolJAvCKQYEkfJS0JY i/qqWmWbnSyr359bRSU9B98FMoTjPPK4NFrnR3nkkK/xfKHmW1T4vD6zDjDMbLVHmgRQ j3N3/7y2A+BD5XpVDwpJU7vwrRvIhDbQ/fzno6yppaUPNOok2g+1JMF+G2sZmxuRfQ/N /BXXmBTezH7bf5medRr+RuncLhh/gFZcRu7cBAnyE5H3B5GNawVXOSfimdKvS4cp44pY Wq2A== MIME-Version: 1.0 Received: by 10.101.11.28 with SMTP id o28mr2827717ani.68.1335008490830; Sat, 21 Apr 2012 04:41:30 -0700 (PDT) Received: by 10.146.168.1 with HTTP; Sat, 21 Apr 2012 04:41:30 -0700 (PDT) In-Reply-To: References: Date: Sat, 21 Apr 2012 15:41:30 +0400 Message-ID: From: "Dmitry S. Kasterin" To: Kevin Oberman Content-Type: text/plain; charset=UTF-8 Cc: freebsd-net@freebsd.org Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 11:41:37 -0000 >> # sysctl net.inet.ip.fw.dyn_fin_lifetime=4 >> net.inet.ip.fw.dyn_fin_lifetime: 1 -> 4 >> # sysctl net.inet.ip.fw.dyn_rst_lifetime=4 >> net.inet.ip.fw.dyn_rst_lifetime: 1 -> 4 > The thing that jumps out is that all of the blocked packets are of FIN > packets. I am not sure why they are being denied as they have FIN+ACK > and that should meet the requirements for 'established". Sorry, it is not clear from my text that the second part of the previous message concerns stateful/dynamic filtering. Stateless filtering works perfectly for me. For stateless (tcp) filtering I've used the following rules: 00101 allow tcp from any to any established 00102 allow tcp from me to any setup And for stateful: 00010 check-state 00101 allow tcp from me to any out setup keep-state > Are you seeing a large number of TCP sessions in partially closed states? Yes, with the default settings (dyn_fin_lifetime=1 and dyn_rst_lifetime=1). With dyn_fin_lifetime=4 and dyn_rst_lifetime=4 this number is fewer. > I don't recall if you mentioned it, but what version of FreeBSD are you > running? 9.0-STABLE / custom kernel > Also, if > you choose to use stateful TCP filtering, it is probably best to do it > in the manner shown in the ipfw(8) man page under DYNAMIC RULES. This > is very different from the way you did it. The "DYNAMIC RULES" section gives the following recommendation: ipfw add check-state ipfw add deny tcp from any to any established ipfw add allow tcp from my-net to any setup keep-state Is the second rule necessary? From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 11:49:55 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE71D106564A for ; Sat, 21 Apr 2012 11:49:55 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm10-vm4.bullet.mail.ne1.yahoo.com (nm10-vm4.bullet.mail.ne1.yahoo.com [98.138.91.170]) by mx1.freebsd.org (Postfix) with SMTP id 8CCD18FC0A for ; Sat, 21 Apr 2012 11:49:55 +0000 (UTC) Received: from [98.138.90.52] by nm10.bullet.mail.ne1.yahoo.com with NNFMP; 21 Apr 2012 11:49:49 -0000 Received: from [98.138.89.166] by tm5.bullet.mail.ne1.yahoo.com with NNFMP; 21 Apr 2012 11:49:49 -0000 Received: from [127.0.0.1] by omp1022.mail.ne1.yahoo.com with NNFMP; 21 Apr 2012 11:49:49 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 840400.78761.bm@omp1022.mail.ne1.yahoo.com Received: (qmail 85284 invoked by uid 60001); 21 Apr 2012 11:49:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1335008989; bh=9uXXVpJiL7/OrQ6qXRuXRPqqJjmxfgiGvgMafgihF10=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type; b=qvmT84iVVXhhTTFBjYl7qh+VdQBs2yDd6ypu+F8kvzn/TIAMa1KIoBJFuEZneUlYh1V47rGiElV0bKJ3h3qNeSnvENLliJxS8bwE8U3FVVzUeYrurjE7l5FBVR6UsOdsLgMGTbVtvKY6nW7gyH8igxqwo7QJOLqsNoWOem6mi0I= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type; b=NBVQIvQJqr12zMNsSAFvgaDlslzA6MiFB2ZO5IGIo4lQ6JY/aZzatPwMcGwRJ8HfFUK3PVA+JeOd95tm09iYUcqS/q+xXN40KklR2r/xeLNJLk7v3kGpZrBugiNPkO+ym4QUK8VYqB9LwFvLmfeeViPZ+y10tGmsjlo+F8uUfVY=; X-YMail-OSG: Ve1NUm8VM1kUF8zrTEd05A6K4LHGiKQAgkm2XW7o.hhMHt1 8dgqPX7iHFKSmF6ZKaq09kVcoQDOhKuQQSa2qre5KOAeEirID4gn0IPdZO9x SH58k1FHp34AyqJPA6HnHqEt6ifAoE2mCQPU1gw0DyL8Dt6F0C72jmOHlaRx rtQUJDQuec3lR.5fjbFaVdsFL0AOOZoPKtAxMCUjtgXupIKN6a6rBNU3O050 JzLsJxzOoQs4gNZnzPUQEUvs0dRAFkH5n5RfyFJb0faW7G7qQlKHZORzXknJ VLogGmPGTcC.UPGN.11vpqn4jIxSz_3oM4NB1.uF5qBZ5.nz_V4NInOBDOAG zkmjJmykYvS2oEB5_YnVbFLdSFkXZKtNpWTR8AidQNfJ_9Msq8Z_NxxL.vS. RcOytq329FB5IGu5NfCojRhDpypIZt8VnWQYc9lIhm5.qbYCEvj48y03aAi8 keQK0FdGlgWAl Received: from [174.48.129.108] by web126004.mail.ne1.yahoo.com via HTTP; Sat, 21 Apr 2012 04:49:49 PDT X-Mailer: YahooMailClassic/15.0.5 YahooMailWebService/0.8.117.340979 Message-ID: <1335008989.85136.YahooMailClassic@web126004.mail.ne1.yahoo.com> Date: Sat, 21 Apr 2012 04:49:49 -0700 (PDT) From: Barney Cordoba To: freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailman-Approved-At: Sat, 21 Apr 2012 12:43:17 +0000 Subject: FreeBSD 7 on Newer MBs X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 11:49:55 -0000 For a variety of reasons I have a client stuck on FreeBSD 7.0 and they're interested in getting a MB that uses the latest CPUS. They're just using the console, so there are no graphics; can someone provide insight as to whether this would be expected to work without serious problems? BC From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 15:32:13 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B9D38106564A for ; Sat, 21 Apr 2012 15:32:13 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159]) by mx1.freebsd.org (Postfix) with ESMTP id 2A46C8FC0C for ; Sat, 21 Apr 2012 15:32:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id q3LFViVA042024; Sun, 22 Apr 2012 01:31:45 +1000 (EST) (envelope-from smithi@nimnet.asn.au) Date: Sun, 22 Apr 2012 01:31:44 +1000 (EST) From: Ian Smith To: "Dmitry S. Kasterin" In-Reply-To: Message-ID: <20120421222621.O91148@sola.nimnet.asn.au> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-net@freebsd.org, Kevin Oberman Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 15:32:13 -0000 On Sat, 21 Apr 2012 15:41:30 +0400, Dmitry S. Kasterin wrote: [..] > 9.0-STABLE / custom kernel > > > Also, if > > you choose to use stateful TCP filtering, it is probably best to do it > > in the manner shown in the ipfw(8) man page under DYNAMIC RULES. This > > is very different from the way you did it. > > The "DYNAMIC RULES" section gives the following recommendation: > ipfw add check-state > ipfw add deny tcp from any to any established > ipfw add allow tcp from my-net to any setup keep-state > > Is the second rule necessary? Probably not where default policy is deny, but maybe instructive there. When using stateful TCP rules, you 'should' never see any established packets that aren't part of a dynamic session; those that are will be taken care of by the check-state, assuming they don't arrive beyond timeouts - and counted, both ways, at the setup keep-state rule. You'll likely see quite a few supposedly 'established' packets from bots scanning the planet in general, usually but not only from somewhere:80. Add log to that deny if curious about such background radiation, and set sysctl net.inet.tcp.log_in_vain=1 if obsessively curious :) Like Kevin, I use dynamic rules only for some outbound UDP, but here on low-bandwidth systems where performance is scarcely an issue, nor DoS. For a good example using both stateless and stateful rules you may find the /etc/rc.firewall 'workstation' ruleset useful. cheers, Ian From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 16:08:36 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D85D106566B for ; Sat, 21 Apr 2012 16:08:36 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from nk11p00mm-asmtp001.mac.com (nk11p00mm-asmtp001.mac.com [17.158.161.0]) by mx1.freebsd.org (Postfix) with ESMTP id 2610B8FC08 for ; Sat, 21 Apr 2012 16:08:36 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from [17.153.54.18] (unknown [17.153.54.18]) by nk11p00mm-asmtp001.mac.com (Oracle Communications Messaging Server 7u4-23.01(7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTPSA id <0M2U0018O7I46220@nk11p00mm-asmtp001.mac.com> for freebsd-net@freebsd.org; Sat, 21 Apr 2012 16:08:30 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7580,1.0.260,0.0.0000 definitions=2012-04-21_05:2012-04-21, 2012-04-21, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=6.0.2-1012030000 definitions=main-1204210175 From: Chuck Swiger In-reply-to: Date: Sat, 21 Apr 2012 09:08:28 -0700 Message-id: <4D11B17F-B0D4-4F71-A597-4A309D39C7B4@mac.com> References: To: "Dmitry S. Kasterin" X-Mailer: Apple Mail (2.1084) Cc: freebsd-net Subject: Re: Stateful IPFW - too many connections in FIN_WAIT_2 or LAST_ACK states X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 16:08:36 -0000 On Apr 21, 2012, at 4:41 AM, Dmitry S. Kasterin wrote: > The "DYNAMIC RULES" section gives the following recommendation: > ipfw add check-state > ipfw add deny tcp from any to any established > ipfw add allow tcp from my-net to any setup keep-state > > Is the second rule necessary? If your security policy is "default deny", then yes. Regards, -- -Chuck