Date: Thu, 12 Apr 2007 07:51:42 -0400 From: Randall Stewart <rrs@cisco.com> To: Robert Watson <rwatson@FreeBSD.org> Cc: freebsd-current@FreeBSD.org, Brad Penoff <penoff@cs.ubc.ca>, Janardhan Iyengar <iyengar@conncoll.edu> Subject: Re: CPU utilization Message-ID: <461E1D4E.3090806@cisco.com> In-Reply-To: <20070412114344.G64803@fledge.watson.org> References: <461E0078.3050001@cisco.com> <20070412114344.G64803@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert: A few comments .. in line :-D Robert Watson wrote: > > On Thu, 12 Apr 2007, Randall Stewart wrote: > >> I have probably an old question that has been asked.. but here goes >> anyway. >> >> I have three machines. >> >> 1) stewart - running 7.0 (2.8gig p4 dual core) >> 2) bsd1 - running 7.0 (2.8gig Xeon Hyperthreaded) >> 3) bsd2 - running 6.2 (2.4gig Xeon Hyperthreaded) >> >> Now if I run tests that max out cpu (at least I think they do).. I see >> <1> or <2> drag down to 1% idle/ even 0 %idle. >> >> However <3> never drops below 50% idle.. it preforms a lot slower >> too.. which I expect since it is somewhat of an older processor.. but >> in running say top -S > > It strikes me that there are two possibilities here, and it could be > both are true: > > (1) In 7.x, there are scheduling and accounting changes that could > result in > both better utilization and different measurement. > > (2) In 7.x, certain debugging features default to on (WITNESS, INVARIANTS, > user space malloc debugging) that add significant (!) overhead. The first thing I do when I go to 7.0 is go edit out the malloc debug :-D.. and I know these machines do NOT have witness and invarients on.. I have a separate build that I use for that one :-D > > I'd confirm first that (2) isn't the cause of the change -- make sure > you have a kernel without debugging features turned on, and change the > man page on malloc.conf to make sure user debugging is turned off for > malloc. Then let's revisit (1). > I have some interesting results here that I think indicate something.. not sure.... when playing with the hyperthreading switches :-D Having hyperthreading OFF on the sender side (this is an SCTP test) and having hyperthreading ON on the receiver side seem to give me the best performance. When the 7.0 machine is the sender and the 6.2 machine the receiver in this config I get 930Mb (user data) on my gig link.. Thats pretty good :-D When I turn hyperthreading on in this layout for the sender we drop to 600Mb. Now reversing it.. the difference is not so dramatic. Having hyperthreading OFF on the sender (the 2.4 Gig 6.2 machine) and hyperthreading ON on the 2.8G 7.0 machine I see about 790Mb having hyperthreading on the sender as well we drop to around 690Nb. Also interestingly if fast machine is receiving, with no hyper theading.. I see around the same performance as the above.. 790Mb that does not happen when the 6.2/slow machine is the receiver.. I see 780Mb vs the 930Mb Soo.. I am thinking a couple of things about the SCTP code... a) I must have a lock contention issue on the sending side. b) The receiver side code does not have this issue and appears to work well with the hyperthreading.. I need to also go check out what is going on with H-T in the 7.0 as the sender and turn on mutex_profiling.. this may confirm my thoughts on this :-D R -- Randall Stewart NSSTG - Cisco Systems Inc. 803-345-0369 <or> 803-317-4952 (cell)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?461E1D4E.3090806>