Date: Tue, 13 Dec 2011 00:48:38 +0100 From: "O. Hartmann" <ohartman@zedat.fu-berlin.de> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: Bruce Cran <bruce@cran.org.uk>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd-current@freebsd.org>, freebsd-stable@freebsd.org, freebsd-performance@freebsd.org Subject: Re: SCHED_ULE should not be the default Message-ID: <4EE692D6.5010208@zedat.fu-berlin.de> In-Reply-To: <20111212170604.GA74044@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <20111212155159.GB73597@troutmask.apl.washington.edu> <4EE6295B.3020308@cran.org.uk> <20111212170604.GA74044@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigBAD169A868B0E5A48D5A4634 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 12/12/11 18:06, Steve Kargl wrote: > On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote: >> On 12/12/2011 15:51, Steve Kargl wrote: >>> This comes up every 9 months or so, and must be approaching FAQ=20 >>> status. In a HPC environment, I recommend 4BSD. Depending on the=20 >>> workload, ULE can cause a severe increase in turn around time when=20 >>> doing already long computations. If you have an MPI application,=20 >>> simply launching greater than ncpu+1 jobs can show the problem. PS:=20 >>> search the list archives for "kargl and ULE".=20 >> >> This isn't something that can be fixed by tuning ULE? For example for = >> desktop applications kern.sched.preempt_thresh should be set to 224 fr= om=20 >> its default. I'm wondering if the installer should ask people what the= =20 >> typical use will be, and tune the scheduler appropriately. >> Is the tuning of kern.sched.preempt_thresh and a proper method of estimating its correct value for the intended to use workload documented in the manpages, maybe tuning()? I find it hard to crawl a lot of pros and cons of mailing lists for evaluating a correct value of this, seemingly, important tunable. >=20 > Tuning kern.sched.preempt_thresh did not seem to help for > my workload. My code is a classic master-slave OpenMPI > application where the master runs on one node and all > cpu-bound slaves are sent to a second node. If I send > send ncpu+1 jobs to the 2nd node with ncpu's, then=20 > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The > last two jobs are assigned to the ncpu'th cpu, and=20 > these ping-pong on the this cpu. AFAICT, it is a cpu > affinity issue, where ULE is trying to keep each job > associated with its initially assigned cpu. >=20 > While one might suggest that starting ncpu+1 jobs > is not prudent, my example is just that. It is an > example showing that ULE has performance issues.=20 > So, I now can start only ncpu jobs on each node > in the cluster and send emails to all other users > to not use those node, or use 4BSD and not worry > about loading issues. >=20 --------------enigBAD169A868B0E5A48D5A4634 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJO5pLWAAoJEOgBcD7A/5N86FIIAMlp2MmSfYGAw+Gqn5MuN/s1 VxWt+47R+tii3x2I5rvjigs2+c5BbMhQ5B/+LS1qU8OspeAwWcvqYnXCXwKs7kUo FG+8mmdyVaqt9s1hoh/W4tHgDgL/DCMxwkIfS3yVubjqOltDo7npcre7sMoUaEjL lv0ySiLArwHbnD4mdrC3gJz/fW0enmNOl9wGYWWcUPcDdJ5XdYMSfSGk0W6bpSgA ewDaoPtz1jh/CkLAVH59/cxcHowtsM9YcrdTOPKOIAI9amNChlvtuv8Sv8g2LC9e RhgNHCE6RKVqAIpyIZLTFZ6pUfTtQeI6CtqWHDDAvhYAUEZxZmBDErazPkkirWQ= =prJ+ -----END PGP SIGNATURE----- --------------enigBAD169A868B0E5A48D5A4634--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EE692D6.5010208>