From owner-freebsd-current  Mon Apr  7 15:01:02 1997
Return-Path: <owner-current>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id PAA16419
          for current-outgoing; Mon, 7 Apr 1997 15:01:02 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id PAA16411
          for <freebsd-current@freebsd.org>; Mon, 7 Apr 1997 15:00:57 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA02263; Mon, 7 Apr 1997 14:41:59 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199704072141.OAA02263@phaeton.artisoft.com>
Subject: Re: POLL & the Single FreeBSD'r
To: peter@spinner.DIALix.COM (Peter Wemm)
Date: Mon, 7 Apr 1997 14:41:58 -0700 (MST)
Cc: terry@lambert.org, freebsd-current@freebsd.org
In-Reply-To: <199704072137.FAA03511@spinner.DIALix.COM> from "Peter Wemm" at Apr 8, 97 05:37:40 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-current@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> > How does implementing select on top of poll hooks impact the ability
> > to specify a 1uS valued timeval struct for the select timeout?  Does
> > it round to the 10ms granularity of poll, or does it work as expected
> > (and as documented in the select() man page)?
> 
> It's got nothing to do with that.  What I'm talking about is that the
> select functions in the device/file ops etc switch tables have a select
> backend that tests for FREAD, FWRITE or 0 (== exception).  If you implement
> poll() over the top of that backend, you can't emulate the priority band
> stuff (urgent data).  if, on the other hand, the backends are updated to
> scan for poll type events (read, write, urgent readable data, hangup, etc),
> you can implement poll() fully (and unambigiously), without hurting or 
> penalising the select top end.  Besides, all the timeout stuff is done in 
> the top level code, select() or poll() in sys_generic.  The fact that both 
> call pollscan() or selscan() is irrelevant, because the scan routines are 
> instant and do not sleep or timeout.  The timeout is all up to the individual 
> syscall handler.

OK.  That's cool, then.  It's a good idea.


> What you want is a high-resolution timer/sleep/schedule system, which we
> don't have, and nobody has offered to implement yet, so it's pretty 
> unlikely that we'll see it in the near future.  (This doesn't mean that it 
> cannot be done, just that nobody has wanted it badly enough to do it.  
> Messing with timers and a more precise sleep queue that can deal with the 
> next event in microseconds for the timer programming might be enough, 
> especially when combined with the RT schedule options)

Yes; kernel preemtion on timer events before process quantum expiration
is probably 90% of the way to real RT support...

I don't necessarily want something with high-resoloution timing right
now, but the select() code *will* operate sub-quantum if there's nothing
else in the run queue without "real" high resoloution support.  SunOS 4.x
has historically worked that way (down to 4uS on a select/timeout buzz
loop on a SPARCStation 1+, actually... better on faster hardware).

I also don't want to preclude it (or require a rewrite) at some later
time when someone goes to support it.


> Incidently, the way the man page that you mention is written, it says the
> timeout is "the maximum amount of time to wait".  It seems to me that
> rounding down to the nearest 10ms would make us more compliant with the man
> page, even though we can't control scheduling to guarantee an immediate
> wakeup.  "waiting for the select event" != "waiting for process
> reschedule". So, if we ask for a 1us timeout, we'd be perfectly compliant
> with the man page to return immediately. In fact, we'd be compliant with
> the man page if we returned immediately  no matter how long was asked for
> (0 seconds is not more than the maximum interval to wait for the selection
> to complete) - this goes to show that what is documented in the man pages
> isn't always good or useful.

Well, I'd prefer that the internal granularity be 1uS, and if the
scheduler can't keep up, then it can't keep up (the part of the man
page I was referrung to was the tv_usec reference).  You fire when
you can, and you schedule as soon as you can without stealing someone
else's quantum (unless you have RT scheduling, and the process is
marked RT).  For an unloaded system, that's still yas better than 10ms,
though how much better is hardware dependent (as it should be).


I could argue about SVID III compliance with "system clock frequency"
instead of "system clock update frequency" as distinguished in SVID III
by setitimer(RT) and gettimeofday(RT).  At the very least, it's required
for ABI compatability with Solaris 2.5 and above, even if the BSD select()
call stays a slug.  8-).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.