From owner-freebsd-arch@FreeBSD.ORG Wed Feb 29 19:41:20 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8B9301065675 for ; Wed, 29 Feb 2012 19:41:20 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 323EB8FC08 for ; Wed, 29 Feb 2012 19:41:19 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 6FCC17300B; Wed, 29 Feb 2012 20:40:42 +0100 (CET) Date: Wed, 29 Feb 2012 20:40:42 +0100 From: Luigi Rizzo To: arch@freebsd.org Message-ID: <20120229194042.GA10921@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="EVF5PPMfhYS0aIcm" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: select/poll/usleep precision on FreeBSD vs Linux vs OSX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Feb 2012 19:41:20 -0000 --EVF5PPMfhYS0aIcm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have always been annoyed by the fact that FreeBSD rounds timeouts in select/usleep/poll in very conservative ways, so i decided to try how other systems behave in this respect. Attached is a simple program that you should be able to compile and run on various OS and see what happens. Here are the results (HZ=1000 on the system under test, and FreeBSD has the same behaviour since at least 4.11): | Actual timeout | select | poll | usleep| timeout | FBSD | Linux | OSX | FBSD | FBSD | usec | 9.0 | Vbox | 10.6 | 9.0 | 9.0 | --------+-------+-------+--------+-------+-------+ 1 2000 99 6 0 2000 10 2000 109 15 0 2000 50 2000 149 66 0 2000 100 2000 196 133 0 2000 500 2000 597 617 0 2000 1000 2000 1103 1136 2000 2000 1001 3000 1103 1136 2000 3000 <--- 1500 3000 1608 1631 2000 3000 <--- 2000 3000 2096 2127 3000 3000 2001 4000 3000 4000 <--- 3001 5000 4000 5000 <--- Note how the rounding (poll has the timeout in milliseconds) affects the actual timeouts when you are past multiples of 1/HZ. I know that until we have some hi-res interrupt source there is no hope to have better than 1/HZ granularity. However we are doing much worse by adding up to 2 extra ticks. This makes apps less responsive than they could be, and gives us no way to "yield until the next tick". So what I would like to do is add a sysctl (disabled by default) that enables a better approximation of the desired delay. I see in the kernel that all three syscalls loop around a blocking function (tsleep or seltdwait), and do check the "actual" elapsed time by calling getmicrouptime() or getnanouptime() around the sleeping function . So the actual timeout passed to tsleep does not really matter (as long as it is greater than 0 ). The only concern is that getmicrouptime()/getnanouptime() are documented as "less precise, but faster to obtain". The question is how precise is "less precise": do we have some way to get an upper bound for the precision of the timers used in get*time(), so we can use that value in the equation instead of the extra 1/HZ that tvtohz() puts in after computing floor(timeout*HZ) ? For reference, below is the core of usleep and select/poll (from kern_time.c and sys_generic.c) usleep: getnanouptime(now) end = now + timeout; for (;;) { getnanouptime(now); delta = end - now; if (delta <= 0) break; tsleep(..., tvtohz(delta) ) } select/poll: itimerfix(timeout) // force at least 1/HZ getmicrouptime(now) end = now + timeout; for (;;) { delta = end - now; seltdwait(..., tvtohz(delta) ) getmicrouptime(now); if (some_fd_is_ready() || now >= end) break; } --- cheers luigi --EVF5PPMfhYS0aIcm--