Date: Thu, 22 Dec 2011 01:07:58 -0800 From: Adrian Chadd <adrian@freebsd.org> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: Attilio Rao <attilio@freebsd.org>, Andrey Chernov <ache@nagual.pp.ru>, George Mitchell <george+freebsd@m5p.com>, Doug Barton <dougb@freebsd.org>, freebsd-stable@freebsd.org Subject: Re: SCHED_ULE should not be the default Message-ID: <CAJ-VmonkjXV-w52Ofbi7zrOYpCdrbjojkV-2kHBATe0JbTWikQ@mail.gmail.com> In-Reply-To: <20111222005250.GA23115@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <CAJ-FndBSOS3hKYqmPnVkoMhPmowBBqy9-%2BeJJEMTdoVjdMTEdw@mail.gmail.com> <20111215215554.GA87606@troutmask.apl.washington.edu> <CAJ-FndD0vFWUnRPxz6CTR5JBaEaY3gh9y7-Dy6Gds69_aRgfpg@mail.gmail.com> <20111222005250.GA23115@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Are you able to go through the emails here and grab out Attilio's example for generating KTR scheduler traces? Adrian On 21 December 2011 16:52, Steve Kargl <sgk@troutmask.apl.washington.edu> w= rote: > On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote: >> 2011/12/15 Steve Kargl <sgk@troutmask.apl.washington.edu>: >> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: >> >> >> >> I basically went through all the e-mail you just sent and identified = 4 >> >> real report on which we could work on and summarizied in the attached >> >> Excel file. >> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review th= e >> >> few datas there and add more, if they want, or make more important >> >> clarifications in particular about the Xorg presence (or rather not) >> >> in their workload. >> > >> > Your summary of my observations appears correct. >> > >> > I have grabbed an up-to-date /usr/src, built and >> > installed world, and built and installed a new >> > kernel on one of the nodes in my cluster. ??It >> > has >> > >> >> It seems a perfect environment, just please make sure you made a >> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). >> >> The first thing is, can you try reproducing your case? As far as I got >> it, for you it was enough to run N + small_amount of CPU-bound threads >> to show performance penalty, so I'd ask you to start with using dnetc >> or just your preferred cpu-bound workload and verify you can reproduce >> the issue. >> As it happens, please monitor the threads bouncing and CPU utilization >> via 'top' (you don't need to be 100% precise, jut to get an idea, and >> keep an eye on things like excessive threads migration, thread binding >> obsessity, low throughput on CPU). >> One note: if your workloads need to do I/O please use a tempfs or >> memory storage to do so, in order to reduce I/O effects at all. >> Also, verify this doesn't happen with 4BSD scheduler, just in case. >> >> Finally, if the problem is still in place, please recompile your >> kernel by adding: >> options KTR >> options KTR_ENTRIES=3D262144 >> options KTR_COMPILE=3D(KTR_SCHED) >> options KTR_MASK=3D(KTR_SCHED) >> >> And reproduce the issue. >> When you are in the middle of the scheduling issue go with: >> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out >> >> and send to the mailing list along with your dmesg and the >> informations on the CPU utilization you gathered by top(1). >> >> That should cover it all, but if you have further questions, please >> just go ahead. > > Attilio, > > I have placed several files at > > http://troutmask.apl.washington.edu/~kargl/freebsd > > dmesg.txt =A0 =A0 =A0--> dmesg for ULE kernel > summary =A0 =A0 =A0 =A0--> A summary that includes top(1) output of all r= uns. > sysctl.ule.txt --> sysctl -a for the ULE kernel > ktr-ule-problem-kargl.out.gz > > I performed a series of tests with both 4BSD and ULE kernels. > The 4BSD and ULE kernels are identical except of course for the > scheduler. =A0Both witness and invariants are disabled, and malloc > has been compiled without debugging. > > Here's what I did. =A0On the master node in my cluster, I ran an > OpenMPI code that sends N jobs off to the node with the kernel > of interest. =A0There is communication between the master and > slaves to generate 16 independent chunks of data. =A0Note, there > is no disk IO. =A0So, for example, N=3D4 will start 4 essentially > identical numerically intensity jobs. =A0At the start of a run, > the master node instructs each slave job to create a chunk of > data. =A0After the data is created, the slave sends it back to the > master and the master sends instructions to create the next chunk > of data. =A0This communication continues until the 16 chunks have > been assigned, computed, and returned to the master. > > Here is a rough measurement of the problem with ULE and numerical > intensity loads. =A0This command is executed on the master > > time mpiexec -machinefile mf3 -np N sasmp sas.in > > Since time is executed on the master, only the 'real' time is of > interest (the summary file includes user and sys times). =A0This > command is run at 5 times for each N value and up to 10 time for > some N values with the ULE kernel. =A0The following table records > the average 'real' time and the number in (...) is the mean > absolute deviations. > > # =A0N =A0 =A0 =A0 =A0 ULE =A0 =A0 =A0 =A0 =A0 =A0 4BSD > # ------------------------------------- > # =A04 =A0 =A0223.27 (0.502) =A0 221.76 (0.551) > # =A05 =A0 =A0404.35 (73.82) =A0 270.68 (0.866) > # =A06 =A0 =A0627.56 (173.0) =A0 247.23 (1.442) > # =A07 =A0 =A0475.53 (84.07) =A0 285.78 (1.421) > # =A08 =A0 =A0429.45 (134.9) =A0 223.64 (1.316) > > These numbers to me demonstrate that ULE is not a good choice > for a HPC workload. > > If you need more information, feel free to ask. =A0If you would > like access to the node, I can probably arrange that. =A0But, > we can discuss that off-line. > > -- > Steve > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonkjXV-w52Ofbi7zrOYpCdrbjojkV-2kHBATe0JbTWikQ>