Date: Mon, 21 Feb 2011 17:24:05 +0100 From: Jerome Flesch <jerome.flesch@netasq.com> To: freebsd-current@freebsd.org Subject: Process timing issue Message-ID: <4D6291A5.4050206@netasq.com>
next in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms090401030701030108000803 Content-Type: multipart/mixed; boundary="------------070706020906040708070509" This is a multi-part message in MIME format. --------------070706020906040708070509 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Hello, While investigating a timing issue with one of our program, we found out = something weird: We've written a small test program that just calls=20 clock_gettime() a lot of times and checks that the time difference=20 between calls makes sense. In the end, it seems it doesn't always do. Calling twice in a row clock_gettime() takes usually less than 1ms. But=20 with an average load, about 1 time in 200000, more than 10ms are spent=20 between both calls for no apparent reason. According to our tests, when=20 it happens, the time between both calls can go from few milliseconds to=20 many seconds (our best score so far is 10 seconds :). Same goes for=20 gettimeofday(). To reproduce this issue, we use the small test program joined to this=20 mail and openssl speed test commands (usually 'openssl speed -elapsed=20 dsa2048'). It only appears if one of these openssl speed test run on the = same core than our test progam, so we have to start as many openssl=20 instances as there are cores on the test computer. For instance: - FreeBSD 7.3 + Core 2 duo - FreeBSD 7.4/STABLE + Core 2 duo - FreeBSD 8.1 + Core 2 duo - FreeBSD 8.2-RC3 + Core 2 duo (clean setup) - FreeBSD 9.0/CURRENT + Core 2 duo (clean setup, checked out last Friday)= On all these setups, with 2 instances of openssl speed tests, on=20 average, for 2000000*2 calls to clock_gettime(), we got about 10 pauses=20 of between 100 and 300ms - FreeBSD 8.2-RC1 + Xeon W3680 (6 cores): With 12 instances of openssl speed tests, for 10000000*2 calls to=20 clock_gettime(), we got pauses of between 300ms and 10s. - FreeBSD 6.3 + VIA C3: With 1 instance of openssl, we got pauses of between 70ms and 300ms= - FreeBSD 7.3 + VIA C7: With 1 instance of openssl, we got pauses of between 150ms and 1s=20 (Note: this computer may be slightly more loaded then the previous one) For comparison purpose, we also tried on 2 Linux systems: - Linux 2.6.32.15 (Via nano, with a bunch of services running on it): - 1 openssl speed tests + 2000000 iterations: max 10ms - 10 openssl speed tests + 2000000 iterations: max 44ms - Linux 2.6 (Core i7): - 10 openssl speed tests + 2000000: max 1ms We tried setting the test program to the highest priority possible=20 (rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing. Does anyone know if there is a reason for this behavior ? Is there=20 something that can be done to improve things ? --------------070706020906040708070509 Content-Type: text/plain; name="timecheck.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="timecheck.c" #include <sys/limits.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #include <err.h> #include <sysexits.h> int main(int argc, char *argv[]) { struct timespec pre, cur; useconds_t d, dmin, dmax, dsum; int dweird =3D 0; int i, loop; if (argc !=3D 2) { fprintf(stderr, "timecheck <loop>\n"); return EXIT_FAILURE; } loop =3D atoi(argv[1]); dmax =3D dsum =3D 0; dmin =3D UINT_MAX; fprintf(stdout, "looping %d times...\n", loop); for (i =3D 0; i < loop; i++) { if (clock_gettime(CLOCK_MONOTONIC, &pre) < 0) err(EX_OSERR, NULL); if (clock_gettime(CLOCK_MONOTONIC, &cur) < 0) err(EX_OSERR, NULL); d =3D (cur.tv_sec - pre.tv_sec) * 1000000; d =3D d + (cur.tv_nsec / 1000) - (pre.tv_nsec / 1000); dsum +=3D d; if (d < dmin) dmin =3D d; if (d > dmax) dmax =3D d; if (d > 10*1000) /* 10 ms */ dweird++; cur =3D pre; } fprintf(stdout, "dmin =3D %dus\ndmax =3D %dus\ndmean =3D %dus\ndweird = =3D %d\n", dmin, dmax, dsum / loop, dweird); return 0; } --------------070706020906040708070509-- --------------ms090401030701030108000803--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D6291A5.4050206>