Date: Mon, 12 Dec 2011 09:06:04 -0800 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Bruce Cran <bruce@cran.org.uk> Cc: "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd-current@freebsd.org>, freebsd-stable@freebsd.org, freebsd-performance@freebsd.org Subject: Re: SCHED_ULE should not be the default Message-ID: <20111212170604.GA74044@troutmask.apl.washington.edu> In-Reply-To: <4EE6295B.3020308@cran.org.uk> References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <20111212155159.GB73597@troutmask.apl.washington.edu> <4EE6295B.3020308@cran.org.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote: > On 12/12/2011 15:51, Steve Kargl wrote: > >This comes up every 9 months or so, and must be approaching FAQ > >status. In a HPC environment, I recommend 4BSD. Depending on the > >workload, ULE can cause a severe increase in turn around time when > >doing already long computations. If you have an MPI application, > >simply launching greater than ncpu+1 jobs can show the problem. PS: > >search the list archives for "kargl and ULE". > > This isn't something that can be fixed by tuning ULE? For example for > desktop applications kern.sched.preempt_thresh should be set to 224 from > its default. I'm wondering if the installer should ask people what the > typical use will be, and tune the scheduler appropriately. > Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. -- Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111212170604.GA74044>