Date: Sat, 25 Dec 2004 12:37:03 +0000 (GMT) From: Robert Watson <rwatson@freebsd.org> To: Jeff Behl <jbehl@fastclick.com> Cc: freebsd-performance@freebsd.org Subject: RE: %cpu in system - squid performance in FreeBSD 5.3 Message-ID: <Pine.NEB.3.96L.1041225121903.27724E-100000@fledge.watson.org> In-Reply-To: <7632915A8F000C4FAEFCF272A880344165164F@Ehost067.exch005intermedia.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 23 Dec 2004, Jeff Behl wrote: > As a follow up to the below (original message at the very bottom), I > installed a load balancer in front of the machines which terminates the > tcp connections from clients and opens up a few, persistent connections > to each server over which requests are pipelined. In this scenario > everything is copasetic: I'm not very familiar with Squid's architecture, but I would anticipate that what you're seeing is that the cost of additional connections served in parallel is pretty high due to the use of processes. Specifically: if each TCP connection being served gets its own process, and there are a lot of TCP connections, you'll be doing a lot of process forking, context switching, exceeding cache sizes, etc. With just a couple of connections, even if they're doing the same "work", the overhead is much lower. Depending on how much time you're willing to invest in this, we can probably do quite a bit to diagnose where the cost is coming from and look for any specific problems or areas we could optimize. I might start by turning on kernel profiling and doing a profile dump under load. Be aware that turning on profiling uses up a lot of CPU itself, so will reduce the capacity of the system. There's probably documentation elsewhere, but the process I use to set up profiling is here: http://www.watson.org/~robert/freebsd/netperf/profile/ Note that it warns the some results may be incorrect on SMP. I think it would be useful to give it a try anyway just to see if we get something useful. The next thing that would be interesting is using mutex profiling to measure contention on mutexes. The instructions in MUTEX_PROFILING(9) are pretty decent for this purpose. On an SMP system, time spent contending a mutex in active use will be spent spinning, which means wasted CPU. You can cause the kernel to block threads instead using options NO_ADAPTIVE_MUTEXES, but measurement in the past has shown that the overhead of blocking and restarting a thread is generally higher than just spinning. It would be useful to see the output of dmesg at boot to see if any performance options are obviously out of place. Likewise, the output of a couple of stats commands while the system is active would be useful -- for example, a couple of snapshots of "systat -vmstat 1", "netstat -mb", "vmstat -i", "top -S", and "iostat". As a final question: other than CPU consumption, do you have a reliable way to measure how efficiently the system is operating -- in particular, how fast it is able to serve data? Having some sort of metric for performance can be quite useful in optimizing, as it can tell us whether we're accomplishing incremental improvements prior to performance improving to a point where the system isn't saturated. Typical forms might be some sort of web benchmark, etc. If so, it might be interesting to compare the performance of the following configurations: - UP kernel (no SMP compiled in) - SMP kernel but SMP disabled using the appropriate tunable - SMP kernel with SMP enabled Finally, I'm not sure if the box has HTT on it, and if so, if HTT is enabled, but you might want to try disabling it, as it has proven to be relatively ineffective in improving performance in the application tests I've run, while at the same time increasing operating overhead. Another variable that might be interesting to look at is net.isr.enable. To do this, you want to be running 5-STABLE rather than 5.3-RELEASE, as I merged at least one significant bug fix that affects its operation. By default, net.isr.enable is 0, meaning that all inbound network traffic is processed in the netisr thread. When this variable is set to 1, inbound network traffic will be, where possible, directly dispatched in the device driver ithread. This has a couple of impacts, but the main ones are that there are substantially fewer context switches being done, and that parallelism is possible between the netisr and each interface card. This is an experimental feature, so be on the lookout for any resulting nits. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research > > last pid: 3377; load averages: 0.12, 0.09, 0.08 > up 0+17:24:53 10:02:13 > 31 processes: 1 running, 30 sleeping > CPU states: 5.1% user, 0.0% nice, 1.8% system, 1.2% interrupt, 92.0% > idle > Mem: 75M Active, 187M Inact, 168M Wired, 40K Cache, 214M Buf, 1482M Free > Swap: 4069M Total, 4069M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 474 squid 96 0 68276K 62480K select 0 53:38 16.80% 16.80% > squid > 311 bind 20 0 10628K 6016K kserel 0 12:28 0.00% 0.00% > named > > > > It's actually so good that one machine can now handle all traffic > (around 180 Mb/s) at < %50 cpu utilization. Seems like something in the > network stack is responsible for the high %system cpu util... > > jeff > > > -----Original Message----- > From: owner-freebsd-performance@freebsd.org > [mailto:owner-freebsd-performance@freebsd.org] On Behalf Of Jeff Behl > Sent: Tuesday, December 07, 2004 9:17 AM > To: Sean Chittenden > Cc: freebsd-performance@freebsd.org > Subject: Re: %cpu in system - squid performance in FreeBSD 5.3 > > I upgraded to STABLE but most cpu time is still being spent in system. > > This system is doing ~20Mb/s total with all content being grabbed out of > memory. I see similar results when running MySQL (a lot of time being > spent in system) > > Any ideas on what updates to be on the lookout for that might help with > this? Am I right in guessing that this is a SMP issue and doesn't have > anything to do with AMD architecture? > > thx > > > > FreeBSD www2 5.3-STABLE FreeBSD 5.3-STABLE #2: Sun Dec 5 21:06:14 PST > 2004 root@www2.cdn.sjc:/usr/obj/usr/src/sys/SMP amd64 > > > last pid: 15702; load averages: 0.15, 0.31, 0.31 up > 0+19:55:14 09:09:28 > 38 processes: 2 running, 36 sleeping > CPU states: 5.4% user, 0.0% nice, 12.7% system, 3.4% interrupt, 78.4% > idle > Mem: 163M Active, 284M Inact, 193M Wired, 72K Cache, 214M Buf, 1245M > Free > Swap: 4069M Total, 4069M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 486 squid 96 0 79820K 73996K CPU1 1 110:00 15.04% 15.04% > squid > 480 squid 96 0 75804K 70012K select 0 105:56 14.89% 14.89% > squid > > > > > > Sean Chittenden wrote: > > >> but the % system time can fluctuate up to 60 at times. My question > >> is if this is about the type of performance I could expect, or if > >> people have seen better. > > > > > > I don't know about other people, but I suspect you're running into > > lock contention. Try using a post 5.3 snapshot (something from > > RELENG_5) since alc@ has set debug.mpsafevm=1, which lets many calls > > to the VM run without GIANT, which I suspect is your problem and why > > the system usage is all over the place. -sc > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > > > howdy, > > I've got a dual proc AMD64 (2gHz) FreeBSD 5.3R system running two squid > processes (to take advantage of both CPUs). Each process is doing > around 195 req/s, and the total bandwidth is ~40Mb/s (gig nic via bge > driver). Squid is being used exclusively as a reverse proxy, with all > content being served out of memory (very little disk activity). > > Top shows: > > CPU states: 16.0% user, 0.0% nice, 42.7% system, 7.6% interrupt, 33.6% > idle > Mem: 898M Active, 569M Inact, 179M Wired, 214M Buf, 171M Free > Swap: 4069M Total, 4069M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > 14598 squid 108 0 463M 459M select 0 39.2H 59.96% 59.96% squid > 14605 squid 105 0 421M 416M CPU0 1 38.4H 49.95% 49.95% squid > > but the % system time can fluctuate up to 60 at times. My question is > if this is about the type of performance I could expect, or if people > have seen better. I was expecting to see much better performance, > seeing how everything is being served out of memory, but maybe I'm > asking too much? 400 reqs/s from RAM doesn't seem like much. Is this a > FreeBSD issue (anybody else with similar experience)? A majority of the > cpu time being spent in system would seem to indictate such. What is > all the system load? How can i tell? > > Any help/pointers/remarks appreciated > > thanks, > jeff > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041225121903.27724E-100000>