From owner-freebsd-net@FreeBSD.ORG Wed Aug 3 07:59:14 2005 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ACB4216A41F for ; Wed, 3 Aug 2005 07:59:14 +0000 (GMT) (envelope-from dave-sender-1932b5@seddon.ca) Received: from seddon.ca (seddon.ca [203.209.212.18]) by mx1.FreeBSD.org (Postfix) with SMTP id CF4E543D45 for ; Wed, 3 Aug 2005 07:59:13 +0000 (GMT) (envelope-from dave-sender-1932b5@seddon.ca) Received: (qmail 16809 invoked by uid 89); 3 Aug 2005 07:59:11 -0000 Received: by seddon.ca (tmda-sendmail, from uid 89); Wed, 03 Aug 2005 17:59:11 +1000 (EST) References: <1123040973.95445.TMDA@seddon.ca> <20050802225518.G53516@odysseus.silby.com> In-Reply-To: <20050802225518.G53516@odysseus.silby.com> To: freebsd-net@freebsd.org Date: Wed, 03 Aug 2005 17:59:10 +1000 Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit From: Dave+Seddon Message-ID: <1123055951.16791.TMDA@seddon.ca> X-Delivery-Agent: TMDA/1.0.3 (Seattle Slew) Subject: Re: running out of mbufs? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Dave+Seddon List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2005 07:59:14 -0000 Greetings, Thanks to everybody for their quick responces before. (I've also had another crack at my TMDA filter so hopefully my reply address will work this time). Last time I forgot to mention I was pulling the datafiles from a compaq raid system (ciss0: ). I had a large number of files with random content, so there was lots of waiting for disk. I've now setup MFS with not as many files. This seemed to bring back network stability. I also adjusted the TCP windows (net.inet.tcp.sendspace=65536, net.inet.tcp.recvspace=65536), but once on the MFS I found no change moving to the bigger window sizes (net.inet.tcp.sendspace=1024000, net.inet.tcp.recvspace=1024000). I've found that the polling settings all seem to be for 100MB/s not Gig, so I've edited /usr/src/sys/kern/kern_poll.c and increased the #define statements by at least 10: Before: #define MIN_POLL_BURST_MAX 10 #define MAX_POLL_BURST_MAX 1000 After: #define MIN_POLL_BURST_MAX 1000 #define MAX_POLL_BURST_MAX 10000 Then set /etc/sysctl.conf to -------------------- kern.polling.burst=5000 kern.polling.each_burst=1000 kern.polling.burst_max=8000 -------------------- Performance improved lots, although I was still seeing the "kern.polling.short_ticks" increasing rapidly. The /usr/src/sys/kern/kern_poll.c mentions that this means the poll rate is to high, so I dropped the HZ back to 10000 from 15000, and the problem has gone away. The server under siege is now stable with 60 concurrnet sessions, when before it could not handle this. The processes also seem to be in "accept" rather than "lockf". -------------------- last pid: 3469; load averages: 1.79, 1.70, 1.47 up 0+00:28:09 05:59:46 191 processes: 8 running, 183 sleeping CPU states: 2.0% user, 0.0% nice, 32.6% system, 48.0% interrupt, 17.4% idle Mem: 34M Active, 7180K Inact, 87M Wired, 29M Buf, 869M Free Swap: 2023M Total, 2023M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 616 www 4 0 3420K 2152K sbwait 1 0:07 0.39% 0.39% httpd 3305 www 4 0 3432K 2160K accept 1 0:07 0.34% 0.34% httpd 690 www 4 0 3420K 2152K accept 1 0:06 0.34% 0.34% httpd 664 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd 633 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd 651 www 4 0 3436K 2172K RUN 1 0:06 0.24% 0.24% httpd 3390 www 4 0 3432K 2160K accept 0 0:05 0.24% 0.24% httpd 612 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd 631 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd 621 www 4 0 3436K 2172K accept 1 0:06 0.15% 0.15% httpd 697 www 4 0 3436K 2172K RUN 1 0:06 0.15% 0.15% httpd 3380 www 4 0 3432K 2160K sbwait 1 0:06 0.15% 0.15% httpd 3392 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3397 www 4 0 3432K 2160K RUN 1 0:05 0.15% 0.15% httpd 3376 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3383 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3315 www 4 0 3432K 2160K accept 0 0:07 0.10% 0.10% httpd 3309 www 4 0 3432K 2160K sbwait 1 0:07 0.10% 0.10% httpd -------------------- This is another server under siege the same configuration, but without the POLL_BURST_MAX tweaks and HZ=15000. -------------------- last pid: 24068; load averages: 13.54, 5.40, 4.63 up 0+02:59:04 17:19:11 233 processes: 4 running, 228 sleeping, 1 zombie CPU states: 3.8% user, 0.0% nice, 31.8% system, 47.3% interrupt, 17.0% idle Mem: 46M Active, 8396K Inact, 105M Wired, 48K Cache, 33M Buf, 838M Free Swap: 2023M Total, 2023M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 4508 www 4 0 5040K 3256K sbwait 1 0:37 0.54% 0.54% httpd 4497 www 4 0 5040K 3256K sbwait 1 0:34 0.34% 0.34% httpd 4539 www 4 0 5040K 3256K sbwait 1 0:36 0.29% 0.29% httpd 4521 www 20 0 5040K 3256K lockf 1 0:34 0.29% 0.29% httpd 626 www 4 0 5040K 3252K sbwait 1 0:36 0.24% 0.24% httpd 4896 www 20 0 5040K 3256K lockf 1 0:35 0.24% 0.24% httpd 4522 www 4 0 5040K 3256K sbwait 0 0:34 0.24% 0.24% httpd 629 www 20 0 5040K 3252K lockf 1 0:35 0.20% 0.20% httpd 601 www 4 0 5040K 3252K sbwait 1 0:33 0.20% 0.20% httpd 600 www 20 0 5040K 3252K lockf 1 0:35 0.15% 0.15% httpd 674 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd 4787 www 4 0 5040K 3256K sbwait 1 0:34 0.15% 0.15% httpd 669 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd 4509 www 20 0 5040K 3256K lockf 1 0:32 0.15% 0.15% httpd 4486 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 4906 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 4542 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 607 www 4 0 5040K 3252K sbwait 1 0:35 0.10% 0.10% httpd 4510 www 4 0 5040K 3272K sbwait 1 0:35 0.10% 0.10% httpd -------------------- On both system the kern.polling.lost_polls is still increasing rapidly. I'm not sure what to do about this. ?? -------------------- kern.polling.lost_polls: 9605569 -------------------- Also the kern.polling.suspect is increasing similarly. I'm not sure what to do about this either. ?? ------------------ kern.polling.suspect: 608527 ------------------ Also thanks for the info on the VLAN searching. I think the adjustment you suggested sounds good, but at bit out of my league. It seems there are plent of things to tweak in the kernel still. BTW, I'd be interested to know people's thoughts on multiple IP stacks on FreeBSD. It would be really cool to be able to give a jail it's own IP stack bound to a VLAN interface. It could then be like a VRF on Cisco. Regards, Dave Seddon