Date: Tue, 18 Aug 1998 18:38:13 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: Lars.Koeller@post.uni-bielefeld.de (Lars =?iso-8859-1?Q?K=F6ller?=) Cc: tlambert@primenet.com, chuckr@glue.umd.edu, freebsd-smp@FreeBSD.ORG Subject: Re: Per processor load? Message-ID: <199808181838.LAA20956@usr06.primenet.com> In-Reply-To: <199808180539.FAA26168@mitch.hrz.uni-bielefeld.de> from "Lars =?iso-8859-1?Q?K=F6ller?=" at Aug 18, 98 07:39:30 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > For a symmetric system, if the load is 1.0 or above, both CPU's > > should be actively working. > > > > I suppose that what you are asking for is a "processor not idle > > in the case of 1.0 >= load >= 0.0". > > > > To get this, you would have to insert counters into the per CPU > > idle loops, probably using the Appendix H cycle counter before and > > after the per CPU HLT instruction, subtracting the count at exit > > of the last HLT from both, and then subtracting the entry from the > > exit, and dividing to get an "idle ratio". > > > > Gathering this type of statistic could be actively harmful to CPU > > latency coming out of the HLT condition, and could be as high as 10% > > to 20% of the systems ability to do work. > > The basic idea was to treat the CPU's as seperate systems each with > it's own load. This is well known from HPUX, Linux, Solaris, ... > They display the following in, e.g. top: > > System: share Tue Aug 18 07:30:58 1998 > Load averages: 2.42, 2.29, 2.28 > 280 processes: 273 sleeping, 5 running, 2 zombies > Cpu states: > CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS > 0 2.62 0.4% 97.6% 2.0% 0.0% 0.0% 0.0% 0.0% 0.0% > 1 2.22 0.8% 97.0% 2.2% 0.0% 0.0% 0.0% 0.0% 0.0% > --- ---- ----- ----- ----- ----- ----- ----- ----- ----- > avg 2.42 0.6% 97.2% 2.2% 0.0% 0.0% 0.0% 0.0% 0.0% This basically implies a scheduler artifact; each CPU must have its own ready-to-run queue for you to get this statistic; I'm sure that on Solaris, at least, you have to know how to grovel /dev/kmem for the information. FreeBSD is symmetric. That is, there is only one ready-to-run queue for all processors. Anything else would either result in potential job starvation (inequity because on one processor, the jobs you are competing with use 75% of their quantum, being compute intensive, and on the other, they use only 10% of their quantum, being I/O intensive). To combat this, there must be highly complex scheduler changes to ensure CPU affinity and assure process migration based on average behaviour. This is an imperfect homeostasis, at best. To my knowledge, only Solaris attempts this, and then only started doing it halfway well in 2.5. CPU affinity s a big win for non-cache-busting programs (it ensures some cache will be valid the next time the process is run), but for most servers, it's a NOP. Depending on your application mix, expect to perhaps lose (or gain) total compute power over it. As far as INTR time goes, I notice it's not reported. This is not surprising. In Symmetric (APIC) I/O, or "virtual wire mode", the interrupt is directed to any available processor, lowest APIC ID first (see the Intel MP Spec version 1.4). It's really not possible, unless you modify the ISR to record APIC ID, and reverse look it up (an expensive operation) on each interrupt, to determine which CPU is actually getting the interrupt. I notice the other fields are not reported as well, probably for similar reasons. > Memory: 180344K (29336K) real, 256220K (66940K) virtual, 5160K free Page# 1/26 > > CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND > 0 ? 19703 mcfutz 251 25 632K 116K run 6:05 80.27 80.13 schlu > 1 ? 19721 physik 251 25 632K 112K run 4:52 49.42 49.34 process > 1 ? 5375 plond 251 25 34756K 15900K run 2173:38 46.66 46.58 l502.exe Pretty obviously, there aren't two running process on that one CPU. A CPU can be in user space in only one process at a time. 8-). I think what they are doing, since they can tell you the CPU, is either recording what CPU they last ran on, *or*, they are reporting which of the multiple run queues that the program is on. The way to tell this would be to dump this information, and then count the number of processes on one CPU or the other. If there isn't an imbalance, then they are talking ready-to-run-queue. If there *is* an imbalance, then they *may* be talking ready-to-run queue, if they are cache-busting round-robin. This would be a design error, but may be what is happening in an attempt to achieve load balance between the CPU's. To do this, if you last ran on CPU M of N, then you next run on CPU M + 1; when M = N, you next run on CPU 0. For programs that don't benefit from L1 cache, this is a normative win; but again, they would be special casing the code for something that wasn't a very general purpose use. In either case, the statistics that *I* would find interesting is "process migration rate" and "cache miss rate"; the second would be as hard to do as "idle time". 8-(. > So the idea was in a first step to display the load of each CPU in a > seperate graph of xperfmon++ . Perhaps it's a better idea to display > the other parameters like IO rate, interrupts, ... but I don't see a > way to get them CPU dependent. > > Is there any CPU-private parameter in the kernel? There are CPU private areas, certainly; these are memory regions mapped to a single CPU. The processor data area and processor stack are examples; you can get all of them by looking in the locore code for the SMP case, if you are interested. I think the thing to do is to better understand the scheduler and the model to determine which metrics are truly useful, and which statistics are "too expensive". Both Steve Passe and John Dyson would be good resources on this. Note that the FreeBSD SMP scheduling algorithm is not really set in stone yet; for example, there is experimental kernel threading code and CPU affinity code (I'm not sure how complex this is; certainly it's no Solaris) that would make some of what I said weigh differently based on type of load expected. Unfortunately, displaying this information is complicated, in that you have to know what you are displaying... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808181838.LAA20956>