Date: Wed, 4 Apr 2018 15:19:31 +0200 From: Stefan Esser <se@freebsd.org> To: "M. Warner Losh" <imp@freebsd.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Is kern.sched.preempt_thresh=0 a sensible default? (was: Re: Extremely low disk throughput under high compute load) Message-ID: <49fa8de4-e164-0642-4e01-a6188992c32e@freebsd.org> In-Reply-To: <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org> References: <dc8d0285-1916-6581-2b2d-e8320ec3d894@freebsd.org> <CANCZdfoieekesqKa5RmOp=z2vycsVqnVss7ROnO87YTV-qBUzA@mail.gmail.com> <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 02.04.18 um 00:18 schrieb Stefan Esser: > Am 01.04.18 um 18:33 schrieb Warner Losh: >> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser <se@freebsd.org >> <mailto:se@freebsd.org>> wrote: >> >> My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8 >> (starting from a clean state) which caused a load average of 12 for more than >> 1 hour, when I decided to move a directory structure holding some 10 GB to its >> own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB. >> >> I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there >> was nearly no disk activity caused by the buildworld. >> >> The copying proceeded at a rate of at most 10 MB/s, but most of the time less >> than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a >> much better priority than the compute bound compiler processes, but it got >> just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled >> at such a low rate, that it only managed to issue a few controller writes per >> second. >> >> The system is healthy and does not show any problems or anomalies under >> normal use (e.g., file copies are fast, without the high compute load). >> >> This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging. >> >> Is this a regression in -CURRENT? >> >> Does 'sync' push a lot of I/O to the disk? > > Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so > slow, not much is written. > > Normal gstat output for the 3 drives the RAIDZ1 consists of: > > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 2 2 84 39.1 0 0 0.0 7.8 ada0 > 0 4 4 92 66.6 0 0 0.0 26.6 ada1 > 0 6 6 259 66.9 0 0 0.0 36.2 ada3 > dT: 1.058s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 1 1 60 70.6 0 0 0.0 6.7 ada0 > 0 3 3 68 71.3 0 0 0.0 20.2 ada1 > 0 6 6 242 65.5 0 0 0.0 28.8 ada3 > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 5 5 192 44.8 0 0 0.0 22.4 ada0 > 0 6 6 160 61.9 0 0 0.0 26.5 ada1 > 0 6 6 172 43.7 0 0 0.0 26.2 ada3 > > This includes the copy process and the reads caused by "make -j 8 world" > (but I assume that all the source files are already cached in ARC). I have identified the cause of the extremely low I/O performance (2 to 6 read operations scheduled per second). The default value of kern.sched.preempt_thresh=0 does not give any CPU to the I/O bound process unless a (long) time slice expires (kern.sched.quantum=94488 on my system with HZ=1000) or one of the CPU bound processes voluntarily gives up the CPU (or exits). Any non-zero value of preemt_thresh lets the system perform I/O in parallel with the CPU bound processes, again. I'm not sure about the bias relative to the PRI values displayed by top, but for me a process with PRI above 72 (in top) should be eligible for preemption. What value of preempt_thresh should I use to get that behavior? And, more important: Is preempt_thresh=0 a reasonable default??? This prevents I/O bound processes from making reasonable progress if all CPU cores/threads are busy. In my case, performance dropped from > 10 MB/s to just a few hundred KB per second, i.e. by a factor of 30. (The %busy values in my previous mail are misleading: At 10 MB/s the disk was about 70% busy ...) Should preempt_thresh be set to some (possibly high, to only preempt long running processes) value? Regards, STefan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49fa8de4-e164-0642-4e01-a6188992c32e>