From owner-freebsd-current@FreeBSD.ORG Mon Feb 21 16:43:29 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C34E1065672 for ; Mon, 21 Feb 2011 16:43:29 +0000 (UTC) (envelope-from jerome.flesch@netasq.com) Received: from work.netasq.com (mars.netasq.com [91.212.116.3]) by mx1.freebsd.org (Postfix) with ESMTP id 0BD208FC14 for ; Mon, 21 Feb 2011 16:43:28 +0000 (UTC) Received: from pc_jf.netasq.com (unknown [10.2.200.254]) by work.netasq.com (Postfix) with ESMTPSA id 427AC740001; Mon, 21 Feb 2011 17:26:17 +0100 (CET) Message-ID: <4D6291A5.4050206@netasq.com> Date: Mon, 21 Feb 2011 17:24:05 +0100 From: Jerome Flesch User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.1.16) Gecko/20110106 Thunderbird/3.0.11 MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms090401030701030108000803" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Process timing issue X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Feb 2011 16:43:29 -0000 This is a cryptographically signed message in MIME format. --------------ms090401030701030108000803 Content-Type: multipart/mixed; boundary="------------070706020906040708070509" This is a multi-part message in MIME format. --------------070706020906040708070509 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Hello, While investigating a timing issue with one of our program, we found out = something weird: We've written a small test program that just calls=20 clock_gettime() a lot of times and checks that the time difference=20 between calls makes sense. In the end, it seems it doesn't always do. Calling twice in a row clock_gettime() takes usually less than 1ms. But=20 with an average load, about 1 time in 200000, more than 10ms are spent=20 between both calls for no apparent reason. According to our tests, when=20 it happens, the time between both calls can go from few milliseconds to=20 many seconds (our best score so far is 10 seconds :). Same goes for=20 gettimeofday(). To reproduce this issue, we use the small test program joined to this=20 mail and openssl speed test commands (usually 'openssl speed -elapsed=20 dsa2048'). It only appears if one of these openssl speed test run on the = same core than our test progam, so we have to start as many openssl=20 instances as there are cores on the test computer. For instance: - FreeBSD 7.3 + Core 2 duo - FreeBSD 7.4/STABLE + Core 2 duo - FreeBSD 8.1 + Core 2 duo - FreeBSD 8.2-RC3 + Core 2 duo (clean setup) - FreeBSD 9.0/CURRENT + Core 2 duo (clean setup, checked out last Friday)= On all these setups, with 2 instances of openssl speed tests, on=20 average, for 2000000*2 calls to clock_gettime(), we got about 10 pauses=20 of between 100 and 300ms - FreeBSD 8.2-RC1 + Xeon W3680 (6 cores): With 12 instances of openssl speed tests, for 10000000*2 calls to=20 clock_gettime(), we got pauses of between 300ms and 10s. - FreeBSD 6.3 + VIA C3: With 1 instance of openssl, we got pauses of between 70ms and 300ms= - FreeBSD 7.3 + VIA C7: With 1 instance of openssl, we got pauses of between 150ms and 1s=20 (Note: this computer may be slightly more loaded then the previous one) For comparison purpose, we also tried on 2 Linux systems: - Linux 2.6.32.15 (Via nano, with a bunch of services running on it): - 1 openssl speed tests + 2000000 iterations: max 10ms - 10 openssl speed tests + 2000000 iterations: max 44ms - Linux 2.6 (Core i7): - 10 openssl speed tests + 2000000: max 1ms We tried setting the test program to the highest priority possible=20 (rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing. Does anyone know if there is a reason for this behavior ? Is there=20 something that can be done to improve things ? --------------070706020906040708070509 Content-Type: text/plain; name="timecheck.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="timecheck.c" #include #include #include #include #include #include #include #include int main(int argc, char *argv[]) { struct timespec pre, cur; useconds_t d, dmin, dmax, dsum; int dweird =3D 0; int i, loop; if (argc !=3D 2) { fprintf(stderr, "timecheck \n"); return EXIT_FAILURE; } loop =3D atoi(argv[1]); dmax =3D dsum =3D 0; dmin =3D UINT_MAX; fprintf(stdout, "looping %d times...\n", loop); for (i =3D 0; i < loop; i++) { if (clock_gettime(CLOCK_MONOTONIC, &pre) < 0) err(EX_OSERR, NULL); if (clock_gettime(CLOCK_MONOTONIC, &cur) < 0) err(EX_OSERR, NULL); d =3D (cur.tv_sec - pre.tv_sec) * 1000000; d =3D d + (cur.tv_nsec / 1000) - (pre.tv_nsec / 1000); dsum +=3D d; if (d < dmin) dmin =3D d; if (d > dmax) dmax =3D d; if (d > 10*1000) /* 10 ms */ dweird++; cur =3D pre; } fprintf(stdout, "dmin =3D %dus\ndmax =3D %dus\ndmean =3D %dus\ndweird = =3D %d\n", dmin, dmax, dsum / loop, dweird); return 0; } --------------070706020906040708070509-- --------------ms090401030701030108000803--