Date: Wed, 03 Aug 2005 17:59:10 +1000 From: Dave+Seddon <dave-sender-1932b5@seddon.ca> To: freebsd-net@freebsd.org Subject: Re: running out of mbufs? Message-ID: <1123055951.16791.TMDA@seddon.ca> In-Reply-To: <20050802225518.G53516@odysseus.silby.com> References: <1123040973.95445.TMDA@seddon.ca> <20050802225518.G53516@odysseus.silby.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Greetings, Thanks to everybody for their quick responces before. (I've also had another crack at my TMDA filter so hopefully my reply address will work this time). Last time I forgot to mention I was pulling the datafiles from a compaq raid system (ciss0: <HP Smart Array 6i>). I had a large number of files with random content, so there was lots of waiting for disk. I've now setup MFS with not as many files. This seemed to bring back network stability. I also adjusted the TCP windows (net.inet.tcp.sendspace=65536, net.inet.tcp.recvspace=65536), but once on the MFS I found no change moving to the bigger window sizes (net.inet.tcp.sendspace=1024000, net.inet.tcp.recvspace=1024000). I've found that the polling settings all seem to be for 100MB/s not Gig, so I've edited /usr/src/sys/kern/kern_poll.c and increased the #define statements by at least 10: Before: #define MIN_POLL_BURST_MAX 10 #define MAX_POLL_BURST_MAX 1000 After: #define MIN_POLL_BURST_MAX 1000 #define MAX_POLL_BURST_MAX 10000 Then set /etc/sysctl.conf to -------------------- kern.polling.burst=5000 kern.polling.each_burst=1000 kern.polling.burst_max=8000 -------------------- Performance improved lots, although I was still seeing the "kern.polling.short_ticks" increasing rapidly. The /usr/src/sys/kern/kern_poll.c mentions that this means the poll rate is to high, so I dropped the HZ back to 10000 from 15000, and the problem has gone away. The server under siege is now stable with 60 concurrnet sessions, when before it could not handle this. The processes also seem to be in "accept" rather than "lockf". -------------------- last pid: 3469; load averages: 1.79, 1.70, 1.47 up 0+00:28:09 05:59:46 191 processes: 8 running, 183 sleeping CPU states: 2.0% user, 0.0% nice, 32.6% system, 48.0% interrupt, 17.4% idle Mem: 34M Active, 7180K Inact, 87M Wired, 29M Buf, 869M Free Swap: 2023M Total, 2023M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 616 www 4 0 3420K 2152K sbwait 1 0:07 0.39% 0.39% httpd 3305 www 4 0 3432K 2160K accept 1 0:07 0.34% 0.34% httpd 690 www 4 0 3420K 2152K accept 1 0:06 0.34% 0.34% httpd 664 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd 633 www 4 0 3436K 2172K accept 1 0:06 0.29% 0.29% httpd 651 www 4 0 3436K 2172K RUN 1 0:06 0.24% 0.24% httpd 3390 www 4 0 3432K 2160K accept 0 0:05 0.24% 0.24% httpd 612 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd 631 www 4 0 3436K 2172K accept 1 0:07 0.20% 0.20% httpd 621 www 4 0 3436K 2172K accept 1 0:06 0.15% 0.15% httpd 697 www 4 0 3436K 2172K RUN 1 0:06 0.15% 0.15% httpd 3380 www 4 0 3432K 2160K sbwait 1 0:06 0.15% 0.15% httpd 3392 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3397 www 4 0 3432K 2160K RUN 1 0:05 0.15% 0.15% httpd 3376 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3383 www 4 0 3432K 2160K accept 1 0:05 0.15% 0.15% httpd 3315 www 4 0 3432K 2160K accept 0 0:07 0.10% 0.10% httpd 3309 www 4 0 3432K 2160K sbwait 1 0:07 0.10% 0.10% httpd -------------------- This is another server under siege the same configuration, but without the POLL_BURST_MAX tweaks and HZ=15000. -------------------- last pid: 24068; load averages: 13.54, 5.40, 4.63 up 0+02:59:04 17:19:11 233 processes: 4 running, 228 sleeping, 1 zombie CPU states: 3.8% user, 0.0% nice, 31.8% system, 47.3% interrupt, 17.0% idle Mem: 46M Active, 8396K Inact, 105M Wired, 48K Cache, 33M Buf, 838M Free Swap: 2023M Total, 2023M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 4508 www 4 0 5040K 3256K sbwait 1 0:37 0.54% 0.54% httpd 4497 www 4 0 5040K 3256K sbwait 1 0:34 0.34% 0.34% httpd 4539 www 4 0 5040K 3256K sbwait 1 0:36 0.29% 0.29% httpd 4521 www 20 0 5040K 3256K lockf 1 0:34 0.29% 0.29% httpd 626 www 4 0 5040K 3252K sbwait 1 0:36 0.24% 0.24% httpd 4896 www 20 0 5040K 3256K lockf 1 0:35 0.24% 0.24% httpd 4522 www 4 0 5040K 3256K sbwait 0 0:34 0.24% 0.24% httpd 629 www 20 0 5040K 3252K lockf 1 0:35 0.20% 0.20% httpd 601 www 4 0 5040K 3252K sbwait 1 0:33 0.20% 0.20% httpd 600 www 20 0 5040K 3252K lockf 1 0:35 0.15% 0.15% httpd 674 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd 4787 www 4 0 5040K 3256K sbwait 1 0:34 0.15% 0.15% httpd 669 www 20 0 5040K 3252K lockf 1 0:34 0.15% 0.15% httpd 4509 www 20 0 5040K 3256K lockf 1 0:32 0.15% 0.15% httpd 4486 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 4906 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 4542 www 20 0 5040K 3256K lockf 1 0:36 0.10% 0.10% httpd 607 www 4 0 5040K 3252K sbwait 1 0:35 0.10% 0.10% httpd 4510 www 4 0 5040K 3272K sbwait 1 0:35 0.10% 0.10% httpd -------------------- On both system the kern.polling.lost_polls is still increasing rapidly. I'm not sure what to do about this. ?? -------------------- kern.polling.lost_polls: 9605569 -------------------- Also the kern.polling.suspect is increasing similarly. I'm not sure what to do about this either. ?? ------------------ kern.polling.suspect: 608527 ------------------ Also thanks for the info on the VLAN searching. I think the adjustment you suggested sounds good, but at bit out of my league. It seems there are plent of things to tweak in the kernel still. BTW, I'd be interested to know people's thoughts on multiple IP stacks on FreeBSD. It would be really cool to be able to give a jail it's own IP stack bound to a VLAN interface. It could then be like a VRF on Cisco. Regards, Dave Seddon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1123055951.16791.TMDA>