Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jun 2012 01:01:41 +0400
From:      Andrey Zonov <andrey@zonov.org>
To:        freebsd-hackers@freebsd.org
Subject:   Re: usertime stale at about 371k seconds
Message-ID:  <4FD7AE35.2090003@zonov.org>
In-Reply-To: <4FC71F13.6040008@zonov.org>
References:  <4FC6748B.5030708@zonov.org> <4FC71F13.6040008@zonov.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------010109090108090405030906
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 5/31/12 11:34 AM, Andrey Zonov wrote:
> On 5/30/12 11:27 PM, Andrey Zonov wrote:
>> Hi,
>>
>> I have long running process for which `ps -o usertime -p $pid' shows
>> always the same time - 6190:07.65, `ps -o cputime -p $pid' for the same
>> process continue to grow and now it's 21538:53.61. It looks like
>> overflow in resource usage code or something.
>>
>
> I reproduced that problem with attached program. I ran it with 23
> threads on machine with 24 CPUs and after night I see this:
>
> $ ps -o usertime,time -p 24134 && sleep 60 && ps -o usertime,time -p 24134
> USERTIME TIME
> 6351:24.74 14977:35.19
> USERTIME TIME
> 6351:24.74 15000:34.53
>
> Per thread user-time counts correct:
>
> $ ps -H -o usertime,time -p 24134
> USERTIME TIME
> 0:00.00 0:00.00
> 652:35.84 652:38.59
> 652:34.75 652:37.97
> 652:50.46 652:51.97
> 652:38.93 652:43.08
> 652:39.73 652:43.36
> 652:44.09 652:47.36
> 652:56.49 652:57.94
> 652:51.84 652:54.41
> 652:37.48 652:41.57
> 652:36.61 652:40.90
> 652:39.41 652:42.52
> 653:03.72 653:06.72
> 652:49.96 652:53.25
> 652:45.92 652:49.03
> 652:40.33 652:42.05
> 652:46.53 652:49.31
> 652:44.77 652:47.33
> 653:00.54 653:02.24
> 652:33.31 652:36.13
> 652:51.03 652:52.91
> 652:50.73 652:52.71
> 652:41.32 652:44.64
> 652:59.86 653:03.25
>
> (kgdb) p $my->p_rux
> $14 = {rux_runtime = 2171421985692826, rux_uticks = 114886093,
> rux_sticks = 8353, rux_iticks = 0, rux_uu = 381084736784, rux_su =
> 65773652, rux_tu = 904571706136}
> (kgdb) p $my->p_rux
> $15 = {rux_runtime = 2191831516209186, rux_uticks = 115966087,
> rux_sticks = 8444, rux_iticks = 0, rux_uu = 381084736784, rux_su =
> 66458587, rux_tu = 913099969825}
>
> As you can see rux_uu stale, but rux_uticks still ticks. I think the
> problem is in calcru1(). This expression
>
> uu = (tu * ut) / tt
>
> overflows.
>
> I applied the following patch:
>

I've made some explorations and found that this expression 
'(uint64_t)a*(uint64_t)b/(uint64_t)c' can be replaced with this '(a/c)*b 
+ (a%c)*(b/c) + (a%c)*(b%c)/c' and will be perfect for 0<c<2^32.  Such 
as 'c' is sum of ticks, overflow occurs after 
2^32/128(stathz)/60(sec)/60(min)/24(hours) = 388 days! or after 16 days 
on machine with 24 cores.  That's better than we got now.

In user-land I can use (__uint128_t)a*b/c for this purpose but kernel 
doesn't built with it.  If you know good algorithm how to calculate 
'(uint64_t)a*(uint64_t)b/(uint64_t)c' for 'c > 2^32' please let me know.

-- 
Andrey Zonov

--------------010109090108090405030906
Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
	name="mul_div.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="mul_div.patch"

SW5kZXg6IHN5cy9rZXJuL2tlcm5fcmVzb3VyY2UuYwo9PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMv
a2Vybi9rZXJuX3Jlc291cmNlLmMJKHJldmlzaW9uIDIzNDYwMCkKKysrIHN5cy9rZXJuL2tl
cm5fcmVzb3VyY2UuYwkod29ya2luZyBjb3B5KQpAQCAtODgwLDYgKzg4MCw4IEBAIHJ1ZmV0
Y2h0ZChzdHJ1Y3QgdGhyZWFkICp0ZCwgc3RydWN0IHJ1c2FnZSAqcnUpCiAJY2FsY3J1MShw
LCAmdGQtPnRkX3J1eCwgJnJ1LT5ydV91dGltZSwgJnJ1LT5ydV9zdGltZSk7CiB9CiAKKyNk
ZWZpbmUJbXVsX2RpdihhLCBiLCBjKQkoYS9jKSpiICsgKGElYykqKGIvYykgKyAoYSVjKSoo
YiVjKS9jCisKIHN0YXRpYyB2b2lkCiBjYWxjcnUxKHN0cnVjdCBwcm9jICpwLCBzdHJ1Y3Qg
cnVzYWdlX2V4dCAqcnV4cCwgc3RydWN0IHRpbWV2YWwgKnVwLAogICAgIHN0cnVjdCB0aW1l
dmFsICpzcCkKQEAgLTkwOSwxMCArOTExLDEwIEBAIGNhbGNydTEoc3RydWN0IHByb2MgKnAs
IHN0cnVjdCBydXNhZ2VfZXh0ICpydXhwLCBzCiAJCSAqIFRoZSBub3JtYWwgY2FzZSwgdGlt
ZSBpbmNyZWFzZWQuCiAJCSAqIEVuZm9yY2UgbW9ub3RvbmljaXR5IG9mIGJ1Y2tldGVkIG51
bWJlcnMuCiAJCSAqLwotCQl1dSA9ICh0dSAqIHV0KSAvIHR0OworCQl1dSA9IG11bF9kaXYo
dHUsIHV0LCB0dCk7CiAJCWlmICh1dSA8IHJ1eHAtPnJ1eF91dSkKIAkJCXV1ID0gcnV4cC0+
cnV4X3V1OwotCQlzdSA9ICh0dSAqIHN0KSAvIHR0OworCQlzdSA9IG11bF9kaXYodHUsIHN0
LCB0dCk7CiAJCWlmIChzdSA8IHJ1eHAtPnJ1eF9zdSkKIAkJCXN1ID0gcnV4cC0+cnV4X3N1
OwogCX0gZWxzZSBpZiAodHUgKyAzID4gcnV4cC0+cnV4X3R1IHx8IDEwMSAqIHR1ID4gMTAw
ICogcnV4cC0+cnV4X3R1KSB7CkBAIC05NDEsOCArOTQzLDggQEAgY2FsY3J1MShzdHJ1Y3Qg
cHJvYyAqcCwgc3RydWN0IHJ1c2FnZV9leHQgKnJ1eHAsIHMKIAkJICAgICJ0byAlanUgdXNl
YyBmb3IgcGlkICVkICglcylcbiIsCiAJCSAgICAodWludG1heF90KXJ1eHAtPnJ1eF90dSwg
KHVpbnRtYXhfdCl0dSwKIAkJICAgIHAtPnBfcGlkLCBwLT5wX2NvbW0pOwotCQl1dSA9ICh0
dSAqIHV0KSAvIHR0OwotCQlzdSA9ICh0dSAqIHN0KSAvIHR0OworCQl1dSA9IG11bF9kaXYo
dHUsIHV0LCB0dCk7CisJCXN1ID0gbXVsX2Rpdih0dSwgc3QsIHR0KTsKIAl9CiAKIAlydXhw
LT5ydXhfdXUgPSB1dTsK
--------------010109090108090405030906--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FD7AE35.2090003>