Date: Wed, 13 Jun 2012 01:01:41 +0400 From: Andrey Zonov <andrey@zonov.org> To: freebsd-hackers@freebsd.org Subject: Re: usertime stale at about 371k seconds Message-ID: <4FD7AE35.2090003@zonov.org> In-Reply-To: <4FC71F13.6040008@zonov.org> References: <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------010109090108090405030906 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 5/31/12 11:34 AM, Andrey Zonov wrote: > On 5/30/12 11:27 PM, Andrey Zonov wrote: >> Hi, >> >> I have long running process for which `ps -o usertime -p $pid' shows >> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same >> process continue to grow and now it's 21538:53.61. It looks like >> overflow in resource usage code or something. >> > > I reproduced that problem with attached program. I ran it with 23 > threads on machine with 24 CPUs and after night I see this: > > $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134 > USERTIME TIME > 6351:24.74 14977:35.19 > USERTIME TIME > 6351:24.74 15000:34.53 > > Per thread user-time counts correct: > > $ ps -H -o usertime,time -p 24134 > USERTIME TIME > 0:00.00 0:00.00 > 652:35.84 652:38.59 > 652:34.75 652:37.97 > 652:50.46 652:51.97 > 652:38.93 652:43.08 > 652:39.73 652:43.36 > 652:44.09 652:47.36 > 652:56.49 652:57.94 > 652:51.84 652:54.41 > 652:37.48 652:41.57 > 652:36.61 652:40.90 > 652:39.41 652:42.52 > 653:03.72 653:06.72 > 652:49.96 652:53.25 > 652:45.92 652:49.03 > 652:40.33 652:42.05 > 652:46.53 652:49.31 > 652:44.77 652:47.33 > 653:00.54 653:02.24 > 652:33.31 652:36.13 > 652:51.03 652:52.91 > 652:50.73 652:52.71 > 652:41.32 652:44.64 > 652:59.86 653:03.25 > > (kgdb) p $my->p_rux > $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093, > rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su = > 65773652, rux_tu = 904571706136} > (kgdb) p $my->p_rux > $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087, > rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su = > 66458587, rux_tu = 913099969825} > > As you can see rux_uu stale, but rux_uticks still ticks. I think the > problem is in calcru1(). This expression > > uu = (tu * ut) / tt > > overflows. > > I applied the following patch: > I've made some explorations and found that this expression '(uint64_t)a*(uint64_t)b/(uint64_t)c' can be replaced with this '(a/c)*b + (a%c)*(b/c) + (a%c)*(b%c)/c' and will be perfect for 0<c<2^32. Such as 'c' is sum of ticks, overflow occurs after 2^32/128(stathz)/60(sec)/60(min)/24(hours) = 388 days! or after 16 days on machine with 24 cores. That's better than we got now. In user-land I can use (__uint128_t)a*b/c for this purpose but kernel doesn't built with it. If you know good algorithm how to calculate '(uint64_t)a*(uint64_t)b/(uint64_t)c' for 'c > 2^32' please let me know. -- Andrey Zonov --------------010109090108090405030906 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="mul_div.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="mul_div.patch" SW5kZXg6IHN5cy9rZXJuL2tlcm5fcmVzb3VyY2UuYwo9PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMv a2Vybi9rZXJuX3Jlc291cmNlLmMJKHJldmlzaW9uIDIzNDYwMCkKKysrIHN5cy9rZXJuL2tl cm5fcmVzb3VyY2UuYwkod29ya2luZyBjb3B5KQpAQCAtODgwLDYgKzg4MCw4IEBAIHJ1ZmV0 Y2h0ZChzdHJ1Y3QgdGhyZWFkICp0ZCwgc3RydWN0IHJ1c2FnZSAqcnUpCiAJY2FsY3J1MShw LCAmdGQtPnRkX3J1eCwgJnJ1LT5ydV91dGltZSwgJnJ1LT5ydV9zdGltZSk7CiB9CiAKKyNk ZWZpbmUJbXVsX2RpdihhLCBiLCBjKQkoYS9jKSpiICsgKGElYykqKGIvYykgKyAoYSVjKSoo YiVjKS9jCisKIHN0YXRpYyB2b2lkCiBjYWxjcnUxKHN0cnVjdCBwcm9jICpwLCBzdHJ1Y3Qg cnVzYWdlX2V4dCAqcnV4cCwgc3RydWN0IHRpbWV2YWwgKnVwLAogICAgIHN0cnVjdCB0aW1l dmFsICpzcCkKQEAgLTkwOSwxMCArOTExLDEwIEBAIGNhbGNydTEoc3RydWN0IHByb2MgKnAs IHN0cnVjdCBydXNhZ2VfZXh0ICpydXhwLCBzCiAJCSAqIFRoZSBub3JtYWwgY2FzZSwgdGlt ZSBpbmNyZWFzZWQuCiAJCSAqIEVuZm9yY2UgbW9ub3RvbmljaXR5IG9mIGJ1Y2tldGVkIG51 bWJlcnMuCiAJCSAqLwotCQl1dSA9ICh0dSAqIHV0KSAvIHR0OworCQl1dSA9IG11bF9kaXYo dHUsIHV0LCB0dCk7CiAJCWlmICh1dSA8IHJ1eHAtPnJ1eF91dSkKIAkJCXV1ID0gcnV4cC0+ cnV4X3V1OwotCQlzdSA9ICh0dSAqIHN0KSAvIHR0OworCQlzdSA9IG11bF9kaXYodHUsIHN0 LCB0dCk7CiAJCWlmIChzdSA8IHJ1eHAtPnJ1eF9zdSkKIAkJCXN1ID0gcnV4cC0+cnV4X3N1 OwogCX0gZWxzZSBpZiAodHUgKyAzID4gcnV4cC0+cnV4X3R1IHx8IDEwMSAqIHR1ID4gMTAw ICogcnV4cC0+cnV4X3R1KSB7CkBAIC05NDEsOCArOTQzLDggQEAgY2FsY3J1MShzdHJ1Y3Qg cHJvYyAqcCwgc3RydWN0IHJ1c2FnZV9leHQgKnJ1eHAsIHMKIAkJICAgICJ0byAlanUgdXNl YyBmb3IgcGlkICVkICglcylcbiIsCiAJCSAgICAodWludG1heF90KXJ1eHAtPnJ1eF90dSwg KHVpbnRtYXhfdCl0dSwKIAkJICAgIHAtPnBfcGlkLCBwLT5wX2NvbW0pOwotCQl1dSA9ICh0 dSAqIHV0KSAvIHR0OwotCQlzdSA9ICh0dSAqIHN0KSAvIHR0OworCQl1dSA9IG11bF9kaXYo dHUsIHV0LCB0dCk7CisJCXN1ID0gbXVsX2Rpdih0dSwgc3QsIHR0KTsKIAl9CiAKIAlydXhw LT5ydXhfdXUgPSB1dTsK --------------010109090108090405030906--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD7AE35.2090003>