From owner-freebsd-net@FreeBSD.ORG Mon Aug 4 13:36:29 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E2A51065676 for ; Mon, 4 Aug 2008 13:36:29 +0000 (UTC) (envelope-from eksffa@freebsdbrasil.com.br) Received: from capeta.freebsdbrasil.com.br (capeta.freebsdbrasil.com.br [201.48.151.3]) by mx1.freebsd.org (Postfix) with SMTP id 990B88FC0C for ; Mon, 4 Aug 2008 13:36:28 +0000 (UTC) (envelope-from eksffa@freebsdbrasil.com.br) Received: (qmail 59249 invoked from network); 4 Aug 2008 10:36:26 -0300 Received: by simscan 1.1.0 ppid: 59233, pid: 59234, t: 8.2815s scanners: clamav: 0.91.1/m: spam: 3.1.1 X-Spam-Checker-Version: SpamAssassin: -last, FreeBSD Brasil LTDA rulesets: Yes X-Spam-Status: No, hits=-2.0 required=3.7 Received: from unknown (HELO claire.bh.freebsdbrasil.com.br) (201.48.151.226) by capeta.freebsdbrasil.com.br with SMTP; 4 Aug 2008 10:36:17 -0300 Message-ID: <48970597.3030407@freebsdbrasil.com.br> Date: Mon, 04 Aug 2008 10:35:19 -0300 From: Patrick Tracanelli Organization: FreeBSD Brasil LTDA User-Agent: Thunderbird 2.0.0.0 (X11/20070612) MIME-Version: 1.0 To: Mike Makonnen References: <48918DB5.7020201@wubethiopia.com> <4891CD13.20600@freebsdbrasil.com.br> <48922E9D.1020507@elischer.org> <4893328C.2040105@freebsdbrasil.com.br> <4894447C.3000800@wubethiopia.com> <48945A79.50300@wubethiopia.com> In-Reply-To: <48945A79.50300@wubethiopia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Application layer classifier for ipfw X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2008 13:36:29 -0000 Mike Makonnen escreveu: > Mike Makonnen wrote: >> Patrick Tracanelli wrote: >>> >>> To let you know of my current (real world) tests: >>> >>> - Wireless Internet Provider 1: >>> - 4Mbit/s of Internet Traffic >>> - Classifying default protocols + soulseek + ssh >>> - Classifying 100Mbit/s of dump over ssh >>> >>> Results in: >>> No latency added, very low CPU usage, no packets dropping. >>> >>> - Wireless ISP 2: >>> - 21 Mbit/s of Internet Traffic >>> - Classifying default protocols + soulseek + ssh >>> >>> Results in: >>> No tcp or udp traffic at all; everything that gets diverted never >>> comes out of the divert socket, and ipfw-classifyd logs >>> >>> Aug 1 12:07:35 ourofino last message repeated 58 times >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: bittorrent >>> (rule 50000) >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: edonkey >>> (rule 50000) >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: fasttrack >>> (rule 50000) >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: gnutella >>> (rule 1000) >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: soulseek >>> (rule 50000) >>> Aug 1 12:17:54 ourofino ipfw-classifyd: Loaded Protocol: ssh (rule >>> 50000) >>> Aug 1 12:18:28 ourofino ipfw-classifyd: unable to write to divert >>> socket: Operation not permitted >>> Aug 1 12:18:50 ourofino last message repeated 90 times >> >> Hmmm... this part means that the call to sendto(2) to write the packet >> back into network stack failed. This explains why you are not seein g >> any traffic comming back out of the divert socket, but I don't see why >> it would suddenly fail with a permission error. Could this be a kernel >> bug? >>> Aug 1 12:18:51 ourofino ipfw-classifyd: packet dropped: input queue >>> full >>> Aug 1 12:19:11 ourofino last message repeated 94 times >>> >>> Raised queue len a lot (up to 40960), when the application starts it >>> uses up to 25% CPU and a second after that, CPU usage gets lower the >>> 0.1%. >> >> This looks like a deadlock. If it weren't able to process packets fast >> enough the cpu usage should be high even as it's spewing "packet >> dropped" messages. Can you send me some more information like memory >> usage and the firewall script you are using? How much of the 21Mbits/s >> of traffic is P2P? If you reduce the number of protocols you are >> trying to match against does the behavior change? Using netstat -w1 >> -I can you tell me how many packets per second we're >> talking about for 4Mbits/s and 21Mbit/s? Also, the timestamps from the >> log file seem to show that the daemon is running for approx. 34 sec. >> before the first "unable to write to write to divert socket" message. >> Is it passing traffic during this time? Thanks. >> >> I've uploaded a newer version. Can you try that also please. It includes: >> o SIGHUP forces it to re-read its configuration file >> o rc.d script >> o minor optimization (calls pthread_cond_signal with the mutex >> unlocked) >> o code cleanup >> >> Also, for your convenience I have attached a patch against the earlier >> version that removes a debugging printf that should remove spammage to >> your log files (the current version has it removed already). >> > > Ooops, a few minutes after I sent this email I found a couple of bugs > (one major, and one minor). They were in the original tarball as well as > the newer one I uploaded earlier today. I've uploaded a fixed version of > the code. Can you try that instead please. > > Also, to help track down performance issues I've modified the Makefile > to build a profiled version of the application so you can use gprof(1) > to figure out where any problems lie. > > Cheers. > On gcc 3.4.6, -Wno-pointer-sign compiler switch is unknown and therefore compiling fails: cc -O2 -fno-strict-aliasing -pipe -O2 -pipe -pg -g -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wcast-align -Wunused-parameter -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wno-pointer-sign -c classifyd.c cc1: error: unrecognized command line option "-Wno-pointer-sign" *** Error code 1 Stop in /usr/home/setup/ipfw-classifyd. If removed, does compile fine. On amd64 does not compile: cc -O2 -fno-strict-aliasing -pipe -O2 -pipe -pg -g -Wsystem-headers -Werror -Wall -Wno- format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer -arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wcast-align -Wunus ed-parameter -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wno-pointer -sign -fPIC -pipe -c classifyd.c cc1: warnings being treated as errors classifyd.c: In function 'write_pthread': classifyd.c:813: warning: format '%d' expects type 'int', but argument 4 has type 'size_ t' *** Error code 1 The rules Im running: 00001 1379 320702 fwd 192.168.100.11 ip from any to { 201.91.17.200/29 or dst-ip 189.44.216.168/29 } out via fxp0 00002 0 0 fwd 192.168.100.11 ip from any to { 201.91.17.200/29 or dst-ip 189.44.216.168/29 } out via fxp2 00003 39137 7760042 fwd 189.52.140.1 ip from 189.52.140.10 to any out via fxp0 00004 34462 10801535 fwd 189.52.140.3 ip from 189.52.140.11 to any out via fxp0 00005 68780 29374414 fwd 189.52.140.1 ip from 189.52.140.18 to any out via fxp0 00006 12154 2003678 fwd 189.52.140.3 ip from 189.52.140.19 to any out via fxp0 00500 4741 770166 divert 7777 tcp from any to any via fxp0 00501 172 38599 divert 7777 udp from any to any via fxp0 49999 823215 453482966 allow ip from any to any 65535 0 0 allow ip from any to any Memory usage while running classifyd: Mem: 67M Active, 27M Inact, 44M Wired, 24K Cache, 111M Buf, 859M Free Swap: 1024M Total, 1024M Free classifyd's CPU usage: 5198 root 3 20 0 6132K 5420K kserel 1:14 24.52% ipfw-classifyd System load averga for 5 minutes is 0.2 while not running classifyd and 0.8 while running. Logs with the latest (downloaded a few minutes ago) code: Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: bittorrent (rule 50000) Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: edonkey (rule 50000) Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: fasttrack (rule 50000) Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: gnutella (rule 50000) Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: soulseek (rule 50000) Aug 4 10:15:29 ourofino ipfw-classifyd: Loaded Protocol: ssh (rule 50000) Aug 4 10:15:36 ourofino ipfw-classifyd: packet dropped: input queue full Aug 4 10:16:07 ourofino last message repeated 1594 times Aug 4 10:16:19 ourofino last message repeated 766 times I miss the debug lines! ;D # netstat -w1 -I fxp0 input (fxp0) output packets errs bytes packets errs bytes colls 535 0 286009 515 0 165965 0 416 0 347448 456 0 153585 0 444 0 309008 449 0 185087 0 493 0 190090 474 0 202637 0 547 0 197882 562 0 233751 0 409 0 288318 461 0 249421 0 434 0 173193 464 0 261512 0 437 0 246516 450 0 258908 0 458 0 200348 481 0 157800 0 423 0 250006 427 0 97723 0 367 0 261323 349 0 96405 0 592 0 178843 556 0 91032 0 450 0 248174 432 0 112650 0 390 0 244258 349 0 106532 0 442 0 303159 392 0 125664 0 407 0 237804 350 0 84367 0 460 0 207586 456 0 86633 0 364 0 200758 366 0 99798 0 313 0 241031 325 0 85746 0 463 0 248284 470 0 113385 0 With above protocols, wont matter if I run with the default queue len, with -q 4096 or -q 40960. Its always never enough and I get the "input queue full" logs. However, if I remove add only 1 or 2 protocols (tried bittorrend only and later, edonkey only), things do work fine, for a few seconds, but than: Aug 4 10:22:58 ourofino ipfw-classifyd: Loaded Protocol: bittorrent (rule 50000) Aug 4 10:24:32 ourofino ipfw-classifyd: Loaded Protocol: bittorrent (rule 50000) Aug 4 10:28:02 ourofino ipfw-classifyd: unable to write to divert socket: Invalid argument Aug 4 10:28:33 ourofino last message repeated 20 times Aug 4 10:29:03 ourofino last message repeated 18 times Aug 4 10:29:05 ourofino kernel: pid 5654 (ipfw-classifyd), uid 0: exited on signal 6 (core dumped) Aug 4 10:30:06 ourofino ipfw-classifyd: Loaded Protocol: bittorrent (rule 50000) Aug 4 10:30:08 ourofino ipfw-classifyd: packet dropped: input queue full Aug 4 10:30:09 ourofino last message repeated 186 times Aug 4 10:30:17 ourofino ipfw-classifyd: unable to write to divert socket: Invalid argument Aug 4 10:30:27 ourofino last message repeated 2 times Aug 4 10:32:43 ourofino ipfw-classifyd: Loaded Protocol: edonkey (rule 50000) Aug 4 10:33:04 ourofino ipfw-classifyd: unable to write to divert socket: Invalid argument Aug 4 10:33:22 ourofino last message repeated 11 times Aug 4 10:33:23 ourofino kernel: pid 5901 (ipfw-classifyd), uid 0: exited on signal 6 (core dumped) I get the " unable to write to divert socket: Invalid argument" constantly and when classifyd dies (core dump with signal 6) I get the following debug output: Assertion failed: (datalen >= 0), function classify_pthread, file classifyd.c, line 646. What other useful output I can send you? -- Patrick Tracanelli