Date: Thu, 24 Feb 2011 15:18:31 +0100 From: Jerome Flesch <jerome.flesch@netasq.com> To: Ryan Stone <rysto32@gmail.com> Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-current@freebsd.org Subject: Re: Process timing issue Message-ID: <4D6668B7.5070005@netasq.com> In-Reply-To: <AANLkTik4LEgGBgg4vuUzfMib36Egb2bkOxBL3m1_rYAa@mail.gmail.com> References: <4D6291A5.4050206@netasq.com> <8C8FE4A5-F031-466A-9CB8-46D79EEA280D@mac.com> <4D638050.2010906@netasq.com> <20110222195713.GW66284@funkthat.com> <AANLkTik4LEgGBgg4vuUzfMib36Egb2bkOxBL3m1_rYAa@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms080808040407020805030706 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Thanks for your explanations. It helped greatly. Using ktrdump and=20 schedgraph.py and after modifying our test program to set and unset=20 automatically debug.ktr.mask, I've been able to get useful information. First, It made me realize that task switching, with default settings and = 2 active processes, only occurs each 100ms. Knowing that, expecting a=20 latency time around 100ms was kind of silly :) Next, it seems most of the latency pikes are due to a process starting=20 or waking up. For instance, it usually happens when the openssl speed=20 test is started (=20 http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_s= tart.png=20 ) or when pagedaemon wakes up (I forgot to disable the swap and my test=20 program used too much memory to store the result values ...). I'm not=20 sure why, but when we start openssl, it is often allowed to run for >=3D = 300ms, even with our test program set to real time priority. My=20 intuition is that, at first, it's considered as an interactive process,=20 until the scheduler realizes it's not. But then, does anyone know why it = would take more than 300ms for the scheduler to realize that ? Anyway, by setting kern.sched.interact=3D5 (so openssl isn't considered a= s=20 an interactive process), kern.sched.slice=3D3 (to get an high enough=20 scheduling resolution), and our program to real-time priority, we got=20 rid of both problems. I'm just a little bit worried about=20 kern.sched.slice=3D3. Is there any known side effect when reducing slices= =20 size ? Also, another issue remain: We were hoping to keep our program with a=20 normal priority. However when we set our test program to a normal=20 priority (but still an higher priority than openssl), both get 50% of=20 the CPU (I guess this is to be expected), and from time to time we have=20 a "hiccup" in the scheduling:=20 http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.pn= g=20 =2E Is there any way to avoid them ? In other words, is it possible to=20 make sure that the low priority process never gets more CPU time than=20 the high priority one ? On 23.02.2011 02:08, Ryan Stone wrote: > To debug weird scheduling issues I find it helpful to start by looking > at a schedgraph. schedgraph is a tool that can display a graphical > representation of what the scheduler was doing over a small slice of > time. The one downside is that you have to recompile your kernel to > get the hooks that collect the necessary data, but it sounds like your > problem is pretty easy to reproduce and so that shouldn't be a big > problem. > > To enable the hooks, put the following in your kernel conf: > > options KTR > options KTR_COMPILE=3DKTR_SCHED > options KTR_MASK=3DKTR_SCHED > options KTR_ENTIRES=3D(128*1024) > > Then rebuild and install the new kernel. Next, run your test. The > instant that your test has detected the long delay, set the sysctl > debug.ktr.mask to 0. The scheduler hooks record data into a ring > buffer, so the interesting data can be flushed out within seconds. > Clearing that sysctl will stop any further events from being logged, > which means that the interesting data will stay there until you go and > collect it. > > You can get the raw data by redirecting the output of "ktrdump -ct" > into a file. Then from any machine with a graphical interface(FreeBSD > with X installed, Windows, Mac, whatever) and python installed, run: > $ python schedgraph.py /path/to/ktrdump/output > > You can get schedgraph.py from /usr/src/tools/sched. > > If you want to collect the data again, set the sysctl debug.ktr.mask > back to 0x20000000 to restart gathering scheduler data. > > Ryan > --------------ms080808040407020805030706--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D6668B7.5070005>