Date: Thu, 24 Feb 2011 13:46:13 -0500 From: John Baldwin <jhb@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-current@freebsd.org, Jerome Flesch <jerome.flesch@netasq.com> Subject: Re: Process timing issue Message-ID: <201102241346.13724.jhb@freebsd.org> In-Reply-To: <4D66968C.9030705@freebsd.org> References: <4D6291A5.4050206@netasq.com> <4D6668B7.5070005@netasq.com> <4D66968C.9030705@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, February 24, 2011 12:34:04 pm Andriy Gapon wrote: > on 24/02/2011 16:18 Jerome Flesch said the following: > > Thanks for your explanations. It helped greatly. Using ktrdump and schedgraph.py > > and after modifying our test program to set and unset automatically > > debug.ktr.mask, I've been able to get useful information. > > > > First, It made me realize that task switching, with default settings and 2 active > > processes, only occurs each 100ms. Knowing that, expecting a latency time around > > 100ms was kind of silly :) > > > > Next, it seems most of the latency pikes are due to a process starting or waking > > up. For instance, it usually happens when the openssl speed test is started ( > > http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_start.png > > ) or when pagedaemon wakes up (I forgot to disable the swap and my test program > > used too much memory to store the result values ...). I'm not sure why, but when > > we start openssl, it is often allowed to run for >= 300ms, even with our test > > program set to real time priority. My intuition is that, at first, it's considered > > as an interactive process, until the scheduler realizes it's not. But then, does > > anyone know why it would take more than 300ms for the scheduler to realize that ? > > > > Anyway, by setting kern.sched.interact=5 (so openssl isn't considered as an > > interactive process), kern.sched.slice=3 (to get an high enough scheduling > > resolution), and our program to real-time priority, we got rid of both problems. > > I'm just a little bit worried about kern.sched.slice=3. Is there any known side > > effect when reducing slices size ? > > > > Also, another issue remain: We were hoping to keep our program with a normal > > priority. However when we set our test program to a normal priority (but still an > > higher priority than openssl), both get 50% of the CPU (I guess this is to be > > expected), and from time to time we have a "hiccup" in the scheduling: > > http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.png . Is > > there any way to avoid them ? In other words, is it possible to make sure that the > > low priority process never gets more CPU time than the high priority one ? > > The problems that you describe here sound very much like the issues that John > Baldwin has been trying to solve a short while ago. My recollection is that he > committed some improvements for real time priority processes. Perhaps he'll have > some additional insights based on his observations and testing. Well, the changes I made to 9 simply made rtprio more important than interactive so that rtprio will always preempt interactive time-sharing threads. I'm not quite sure that this is exactly the same. Note that by default ULE does give interactive processes realtime priority, so that is why openssl would not yield early on during startup. As to why it takes the scheduler 300ms to decide openssl is a CPU hog, that I'm less sure of. You'd have to look at the interactive scoring stuff in ULE to debug that. How are you setting your program to a "normal" priority that is still higher than openssl? Are you using nice? Hmm, during your hiccup it looks like openssl got two time slices back to back rather than a single slice. Also, note that in the hiccup graph, both threads have the same priority (183), so openssl effectively has the same priority as timecheck. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201102241346.13724.jhb>
