From owner-freebsd-net@FreeBSD.ORG Tue Jul 1 09:09:48 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7259C1065673 for ; Tue, 1 Jul 2008 09:09:48 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.233]) by mx1.freebsd.org (Postfix) with ESMTP id 3E1238FC23 for ; Tue, 1 Jul 2008 09:09:48 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so2061251rvf.43 for ; Tue, 01 Jul 2008 02:09:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=GynRNili3n2tC6lMrAWNdIUuQbY2RKyDMsTPkoEqCXA=; b=TXNQfFUZuxwCjs79LzXrw9qEw67EGLnyjT9tIdyCWNH8zixqRkpGB4J7/MAXQsr5cF QS9cCxAJzN0IxtiiZpiPIVaQF/J54AdtIhqiW02h6A4RrjFrBHboCEWPzH9ZXuKs+BZ6 qohb1KBByQVFwYzVm1o82cHGOxlUbW0llYo0Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=OOtg9cGdvInJWaKI8T9U67ivlTc/WrSq0IR2Vkw1W/EwPjVGb7DpnCScRy76JfAw1e lvjWj2POIBbI4B6y2dUDSF7IslsCDBhW5e1LjEe6HNGurj2aKe4NpTLe0NrznsJZ2xnY e+6CAsL728ucvkf9Mivko6TDS6oXwtQzfUntA= Received: by 10.141.198.2 with SMTP id a2mr3302532rvq.219.1214903387892; Tue, 01 Jul 2008 02:09:47 -0700 (PDT) Received: by 10.141.212.9 with HTTP; Tue, 1 Jul 2008 02:09:47 -0700 (PDT) Message-ID: Date: Tue, 1 Jul 2008 17:09:47 +0800 From: "Adrian Chadd" Sender: adrian.chadd@gmail.com To: Paul In-Reply-To: <4869F42E.8040904@gtcomm.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4867420D.7090406@gtcomm.net> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <20080701033117.GH83626@cdnetworks.co.kr> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <4869F42E.8040904@gtcomm.net> X-Google-Sender-Auth: 8626adbe26cff990 Cc: FreeBSD Net Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jul 2008 09:09:48 -0000 There's an option to control how many packets it'll process each pass through the isr thread, isn't there? It'd be nicer if this stuff were able to be dynamically tuned. Adrian 2008/7/1 Paul : > [Big list of testing , rebuilding kernel follows] > > Dual Opteron 2212, Recompiled kernel with 7-STABLE and removed a lot of junk > in the config, added > options NO_ADAPTIVE_MUTEXES not sure if that makes any difference > or not, will test without. > Used ULE scheduler, used preemption, CPUTYPE=opteron in /etc/make.conf > 7.0-STABLE FreeBSD 7.0-STABLE #4: Tue Jul 1 01:22:18 CDT 2008 amd64 > Max input rate .. 587kpps? Take into consideration that these packets are > being forwarded out em1 interface which > causes a great impact on cpu usage. If I set up a firewall rule to block > the packets it can do over 1mpps on em0 input. > > input (em0) output > packets errs bytes packets errs bytes colls > 587425 67677 35435456 466 0 25616 0 > 587412 26629 35434766 453 0 24866 0 > 587043 26874 35412442 410 0 22544 0 > 536117 30264 32347300 440 0 24164 0 > 546240 61521 32951060 459 0 25350 0 > 563568 66881 33998676 435 0 23894 0 > 572766 43243 34550840 440 0 24164 0 > 572336 44411 34525836 445 0 24558 0 > 572539 37013 34536222 457 0 25136 0 > 571340 39512 34459008 440 0 24110 0 > 572673 55137 34540576 438 0 24056 0 > 555506 49918 33505764 457 0 25330 0 > 545744 69010 32916908 461 0 25298 0 > 559472 75650 33745636 429 0 23694 0 > 564358 60130 34039104 433 0 23786 0 > > last pid: 1134; load averages: 1.04, 0.94, 0.59 > up 0+00:14:13 01:49:59 > 70 processes: 6 running, 46 sleeping, 17 waiting, 1 lock > CPU: 0.0% user, 0.0% nice, 25.6% system, 0.0% interrupt, 74.4% idle > Mem: 11M Active, 6596K Inact, 45M Wired, 156K Cache, 9072K Buf, 1917M Free > Swap: 8192M Total, 8192M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 12 root 171 ki31 0K 16K RUN 1 12:40 97.56% idle: cpu1 > 36 root -68 - 0K 16K *em1 2 9:44 85.06% em0 taskq > 10 root 171 ki31 0K 16K CPU3 3 11:10 82.47% idle: cpu3 > 13 root 171 ki31 0K 16K CPU0 0 12:25 73.88% idle: cpu0 > 11 root 171 ki31 0K 16K RUN 2 6:43 50.10% idle: cpu2 > 37 root -68 - 0K 16K CPU3 3 1:58 16.46% em1 taskq > > > I noticed.. em0 taskq isn't using 100% cpu like it was on the generic > kernel.. What's up with that? Why do I still have all 4 CPUs pretty idle and > em0 taskq isn't near 100%? I'm going to try 4bsd and see > if that makes it go back to the other way. > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 0 > em0: Missed Packets = 45395545 > em0: Receive No Buffers = 95916690 > em0: Receive Length Errors = 0 > em0: Receive errors = 0 > em0: Crc errors = 0 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 2740181 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 0 > em0: XON Xmtd = 0 > em0: XOFF Rcvd = 0 > em0: XOFF Xmtd = 0 > em0: Good Packets Rcvd = 450913688 > em0: Good Packets Xmtd = 304777 > em0: TSO Contexts Xmtd = 94 > em0: TSO Contexts Failed = 0 > > -----Rebooting with: > kern.hz=2000 > hw.em.rxd=512 > hw.em.txd=512 > > Seems maybe a little bit slower but it's hard to tell since i'm generating > random packets the pps varies about 50k +/- probably depending > on the randomness.. About the same PPS/errors.. here's a vmstat 1 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52276K 1922M 286 0 1 0 277 0 0 0 7686 838 19436 0 > 15 85 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13431 127 33430 0 > 27 73 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13406 115 33222 0 > 27 73 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13430 115 33393 0 > 26 74 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13411 115 33322 0 > 26 74 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13576 123 33415 0 > 25 75 > 0 0 0 52276K 1922M 0 0 0 0 0 0 0 0 13842 115 33354 0 > 26 74 > > ------Trying kern.kz=250 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52288K 1923M 607 1 2 0 582 0 0 0 4885 789 12073 0 > 8 92 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13793 119 33552 0 > 27 73 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13959 115 33446 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13861 115 33707 0 > 30 70 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13784 115 33602 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13886 123 33843 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13913 115 33711 0 > 26 74 > 0 0 0 52288K 1923M 0 0 0 0 0 0 0 0 13920 115 33766 0 > 27 73 > > pps still no major difference.. > jumps between 530k-580k > > -----Putting HZ back to 1000, > recompiling kernel with 4BSD SCHED.. > many minutes later.. (can't do make -j with the kernel or it errors) > Well, I have to say.. 4BSD is less pps, it will not go over 530k however it > seems much, > more consistent and not jumping around as much it stays between 520-530 most > of the time and i see some ticks > at 480's in netstat.. > em0 taskq still not using 100%, max around 75-80 > > -----Building same as above but with preemption off > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 0 52288K 1922M 563 1 2 0 540 0 0 0 6724 725 22195 0 > 12 88 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13200 119 48075 0 > 27 73 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13243 123 49137 0 > 24 76 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13260 115 48633 0 > 26 74 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13247 115 48625 0 > 25 75 > 0 0 0 52288K 1922M 0 0 0 0 0 0 0 0 13248 115 48687 0 > 24 76 > > hmm more context switches.. > pps same, maybe a shade lower.. > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 11 root 171 ki31 0K 16K RUN 2 3:39 97.12% idle: cpu2 > 12 root 171 ki31 0K 16K CPU1 1 3:45 95.70% idle: cpu1 > 36 root -68 - 0K 16K CPU0 0 2:18 82.67% em0 taskq > 10 root 171 ki31 0K 16K CPU3 3 3:24 82.57% idle: cpu3 > 13 root 171 ki31 0K 16K RUN 0 2:01 20.07% idle: cpu0 > 37 root -68 - 0K 16K - 3 0:31 15.58% em1 taskq > > > -------rebuilding with ULE, keeping preemption off > Hmm.. what the? > 450-480kpps seems to be max here. That's.. weird.. > I'm going to have to rebuild with Preemption on again just to double check > this.. > input (em0) output > packets errs bytes packets errs bytes colls > 464020 95690 28009004 434 0 23728 0 > 455318 90105 27484456 469 0 25778 0 > 455720 99914 27511970 462 0 25384 0 > 465019 86021 28071946 428 0 23392 0 > 456024 78336 27528862 440 0 24040 0 > 455018 93526 27468908 440 0 24040 0 > 461235 91218 27841604 464 0 25336 0 > 454345 89812 27427262 424 0 23176 0 > 452661 96937 27327392 441 0 24094 0 > 456584 90393 27561138 459 0 25222 0 > 455021 97441 27470158 450 0 24736 0 > > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr ad4 ad6 in sy cs us > sy id > 0 0 1 52276K 1655M 456 1 1 0 441 0 0 0 9775 3598 26256 0 > 20 80 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12817 119 33056 0 > 25 75 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12700 123 32975 0 > 27 73 > 0 0 0 52276K 1655M 0 0 0 0 0 0 0 0 12659 115 32897 0 > 27 73 > > > ------OK I'm stumped now.. Rebuilt with preemption and ULE and preemption > again and it's not doing what it did before.. > How could that be? Now about 500kpps.. > > That kind of inconsistency almost invalidates all my testing.. why would it > be so much different after trying a bunch of kernel options and rebooting a > bunch of times and then going back to the original config doesn't get you > what it did in the beginning.. > > I'll have to dig into this further.. never seen anything like it :) > > Hopefully the ip_input fix will help free up a few cpu cycles. > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >