Date: Mon, 12 Dec 2011 13:50:11 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Bruce Cran <bruce@cran.org.uk>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, freebsd-stable@freebsd.org, freebsd-performance@freebsd.org, Steve Kargl <sgk@troutmask.apl.washington.edu> Subject: Re: SCHED_ULE should not be the default Message-ID: <201112121350.11784.jhb@freebsd.org> In-Reply-To: <20111212170604.GA74044@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <4EE6295B.3020308@cran.org.uk> <20111212170604.GA74044@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, December 12, 2011 12:06:04 pm Steve Kargl wrote: > On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote: > > On 12/12/2011 15:51, Steve Kargl wrote: > > >This comes up every 9 months or so, and must be approaching FAQ > > >status. In a HPC environment, I recommend 4BSD. Depending on the > > >workload, ULE can cause a severe increase in turn around time when > > >doing already long computations. If you have an MPI application, > > >simply launching greater than ncpu+1 jobs can show the problem. PS: > > >search the list archives for "kargl and ULE". > > > > This isn't something that can be fixed by tuning ULE? For example for > > desktop applications kern.sched.preempt_thresh should be set to 224 from > > its default. I'm wondering if the installer should ask people what the > > typical use will be, and tune the scheduler appropriately. > > > > Tuning kern.sched.preempt_thresh did not seem to help for > my workload. My code is a classic master-slave OpenMPI > application where the master runs on one node and all > cpu-bound slaves are sent to a second node. If I send > send ncpu+1 jobs to the 2nd node with ncpu's, then > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The > last two jobs are assigned to the ncpu'th cpu, and > these ping-pong on the this cpu. AFAICT, it is a cpu > affinity issue, where ULE is trying to keep each job > associated with its initially assigned cpu. > > While one might suggest that starting ncpu+1 jobs > is not prudent, my example is just that. It is an > example showing that ULE has performance issues. > So, I now can start only ncpu jobs on each node > in the cluster and send emails to all other users > to not use those node, or use 4BSD and not worry > about loading issues. This is a case where 4BSD's naive algorithm will spread out the load more evenly because all the threads are on a single, shared queue and each CPU just grabs the head of the queue when it finishes a timeslice. ULE always assigns threads to a single CPU (even if they aren't pinned to a single CPU using cpuset, etc.) and then tries to balance the load across cores later, but I believe in this case it's rebalancer won't have anything to really do as no matter what it does with the N+1 job it's going to be sharing a CPU with another job. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201112121350.11784.jhb>