From owner-cvs-sys  Sun Feb 16 00:23:31 1997
Return-Path: <owner-cvs-sys>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id AAA19548
          for cvs-sys-outgoing; Sun, 16 Feb 1997 00:23:31 -0800 (PST)
Received: from spinner.DIALix.COM (spinner.DIALix.COM [192.203.228.67])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id AAA19518;
          Sun, 16 Feb 1997 00:22:04 -0800 (PST)
Received: from spinner.DIALix.COM (localhost.DIALix.oz.au [127.0.0.1])
          by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id QAA24233;
          Sun, 16 Feb 1997 16:21:45 +0800 (WST)
Message-Id: <199702160821.QAA24233@spinner.DIALix.COM>
X-Mailer: exmh version 2.0gamma 1/27/96
To: Mike Pritchard <mpp@freefall.freebsd.org>
cc: bde@zeta.org.au (Bruce Evans), ache@freefall.freebsd.org,
        cvs-all@freefall.freebsd.org, CVS-committers@freefall.freebsd.org,
        cvs-sys@freefall.freebsd.org
Subject: Re: cvs commit: src/sys/sys types.h 
In-reply-to: Your message of "Sat, 15 Feb 1997 23:02:18 PST."
             <199702160702.XAA16636@freefall.freebsd.org> 
Date: Sun, 16 Feb 1997 16:21:44 +0800
From: Peter Wemm <peter@spinner.DIALix.COM>
Sender: owner-cvs-sys@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Mike Pritchard wrote:
> Bruce Evans wrote:
> > 
> > >ache        97/02/15 14:26:30
> > >
> > >  Modified:    sys/sys   types.h
> > >  Log:
> > >  Bump default FD_SETSIZE from 256 to 1024 as many modern
> > >  systems do nowdays (like SunOs 5.5.1 f.e.)
> > >  256 is too small under real network load
> > 
> > Please back it out. The kernel is not read for this.  It always rounds up
> > to a multiple of FD_SETSIZE bits.  This will clobber old applications.
> > It also bzeros a multiple of 6 * FDSETSIZE bits.  This will take a
> > fairly long time with a large FD_SETSIZE.  It already takes too long.
> > All this is because select() is specified to handle `struct fd_set's.
> > The kernel handles whole objects.
>
> Sounds like it is time for us to implement the poll system
> call.  I noticed that OpenBSD has already done so.

So have I..   But then again, you're all probably sick of hearing 
that by now :-]

I note that NetBSD have gone further and removed the file/vnode/devsw
select entries and replaced them with poll entries.  select() and poll()
are both system calls, but they both share the same device driver/etc
hooks.  There is a good reason why NetBSD have done this..  The poll device
driver interface is a superset of select's hooks.  By implementing poll()
in terms of select's kernel/driver backends, you loose a lot of information
(poll does much more than testing for read/write/error).  But implementing
the select syscall with the poll kernel/driver back ends looses no
information in either case and is virtually for free.  The problem is that
it's a device driver interface change.  I'd like to follow NetBSD here, the
major changes in 3.0 (eg: lite2, smp, etc) are probably the best chance
we'll get to do it while containing the chaos in one single jump.  It's not
application visible except that a poll implementation would be much more
accurate and useful.

Since select/poll backends go through the fileops and VOP tables, it'd be
relatively simple to be able to poll() a directory file so that you wake up
when a file is created or deleted in it, or to poll() a file for a size
change etc. This requires the poll backend - the select backend only does
read/write polling and cannot easily be extended.  The select backends are
defined to take a "which" argument of 0 (exception/error), FREAD or FWRITE.
A poll backend takes a "which" argument that is interpreted as a bit mask
so that you can do fancy stuff like select for readable urgent data, not
just "readable" then try to determine whether it was urgent or not.  To 
implement polling on state/size changes on files or directories, 
ufs_poll() would replace ufs_select() (currently returns true), and would 
actually do something useful.

Also, bzero(6*FD_SETSIZE...) etc can probably be fixed without too much 
pain for select() if somebody's prepared to get blood on their hands.  The 
incoming bits do not appear to need to be zeroed (copyin overwrites it 
with a rounded up to byte boundary and counts the bits that it's scanning 
in selscan), and only the number of bytes that are going to be copyout'ed 
need to be zeroed.

Also, aren't you misreading the roundup?

        /* The amount of space we need to allocate */
        ni = howmany(roundup2 (uap->nd, FD_SETSIZE), NFDBITS) *
                sizeof(fd_mask);
        if (ni > p->p_selbits_size) {
		/* realloc p_selbits */
	}
 ...later...
        /* The amount of space we need to copyin/copyout */
        ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask);

#define getbits(name, x) \
        if (uap->name && \
            (error = copyin((caddr_t)uap->name, (caddr_t)ibits[x], ni))) \
                goto done;
        getbits(in, 0);
        getbits(ou, 1);
        getbits(ex, 2);

Note the reuse of 'ni'.  The application is only exposed to sizeof(fd_mask)
(fd_mask == unsigned long) rounding, not FD_SETSIZE.  The FD_SETSIZE
rounding is only to size the initial workspace to save excess reallocation
calls to the kernel allocator.  Even then it expands by 32 bytes (256
bits), not FD_SETSIZE.

Also, it seems to me that the code is not very optimal.  I'm probably 
about to embarress myself here, but wouldn't it be better something like 
this?:

Change:
        /*
         * This buffer is usually very small therefore it's probably faster
         * to just zero it, rather than calculate what needs to be zeroed.
         */
        bzero (p->p_selbits, p->p_selbits_size * 6);

        /* The amount of space we need to copyin/copyout */
        ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask);

to something more like this:

        /* The amount of space we need to copyin/copyout */
        ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask);
	
	if (ni > 8) {	/* 64 bits - arbitary guess at tradeoff */
		/*
		 * Clear the entire return-to-user buffer space
		 */
		bzero (p->p_selbits + (i + 3) * p->p_selbits_size,
			p->p_selbits_size * 3);
	} else {
		for (i = 0; i < 3; i++)
			bzero(obits[i], ni);
	}

The idea was to figure out where the cost of three function calls versus 
unnecessary clearing of chunks of memory pays off.

Urk... Enough rambling from me..

> > Bruce
> > 
> 
> 
> -- 
> Mike Pritchard
> mpp@FreeBSD.org
> "Go that way.  Really fast.  If something gets in your way, turn"

Cheers,
-Peter