Date: Thu, 29 Oct 2015 00:00:57 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-bugs@freebsd.org Subject: Re: [Bug 204049] vmtotal() loading is high when memory utilization is high Message-ID: <20151028225118.L1832@besplex.bde.org> In-Reply-To: <bug-204049-8-MlbJYA75Pk@https.bugs.freebsd.org/bugzilla/> References: <bug-204049-8@https.bugs.freebsd.org/bugzilla/> <bug-204049-8-MlbJYA75Pk@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 27 Oct 2015 bugzilla-noreply@freebsd.org wrote: Bugzilla doesn't want replies, so I shouldn't reply. I didn't reply to it. > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204049 > > Dmitry Sivachenko <demon@FreeBSD.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |demon@FreeBSD.org > > --- Comment #1 from Dmitry Sivachenko <demon@FreeBSD.org> --- > I noticed that if a program calls clock() frequently (clock() in turn calls > getrusage()), the system itself responds slow. For example, we run word2vec > program > (http://word2vec.googlecode.com/svn/trunk/word2vec.c) in 32 threads (on 32-core > machine) and during that all other programs (even single-threaded) run an order > of magnitude slower compared with the time they use without word2vec. > > I wonder if the reason in the same. It might be just lock contention for both. Only a stupid program would call these functions often, but vmtotal() is much heavier weight and the lock contention for it is more obvious. It holds lots of global locks throughout loops. getrusage() only holds locks for the process. Only a stupid program would call clock() a lot, but clock() is badly implemented. Its units of were suitable in 1980, but became wrong when microtime() started working in 1990-1995. Its units are not even stathz ticks, but are hard-coded 1/128 second ticks for compatibility with the 1980 interface (except that probably had 1/60 second ticks). The timing part of getrusage() takes the (very accurate) process runtime that was recorded using microtime() in 1990-1995 and is now recorded less accurately using cpu_ticks(), and splits it up into user+sys+intr; this only reduces its accuracy slightly. Then clock() reduces its accuracy significantly by discarding the intr part and rounding user+sys to a multiple of 1/128 seconds. clock() also wastes time by getting full rusage and discarding everyrhing except the times. clock() can be implemented better using clock_gettime() on a suitable clock id. This method has only been available for 10-15 years. The following clock ids are suitable: - CLOCK_PROF. This gives the same result as clock() would (user+sys), not dumbed down except to convert it to timespec units, and without wasting time for full getrusage(). But the decomposition part is the slowest. - CLOCK_PROCESS_CPUTIME_ID. This returns the runtime of the current process, not dumbed down exccept to convert it to timespec units. This unfortunately requires considerable proc locking to add up times for all threads in the process. - certain magic clock ids generalize the previous id to an arbitrary process. The following clock ids are related: - CLOCK_VIRTUAL. This returns the 'user' part of the user+sys+intr decomposition of the runtime. It has the same slownesses as CLOCK_PROF. - CLOCK_THREAD_CPUTIME_ID. This is like CLOCK_THREAD_CPUTIME_ID except it only returns the runtime of the current thread. This doesn't using any locking except a critical section. - certain magic clock ids generalize the previous id to an arbitrary thread. There ar many bugs in the implementation of clock_gettime() family. Some of the related ones are: - none of the above unportable clock ids is documented - CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID are bogusly named. The '_ID' in their name is redundant, and is not used for the name of any other clock id. It would be useful for them to operated on a general pid or tid, but they don't. - the undocumented magic clock ids do act on a general pid or tid. The thread case is an implementation detail for pthread_getcpuclockid(3) which is documented. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151028225118.L1832>