From owner-cvs-sys Sun Feb 16 00:23:31 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id AAA19548 for cvs-sys-outgoing; Sun, 16 Feb 1997 00:23:31 -0800 (PST) Received: from spinner.DIALix.COM (spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id AAA19518; Sun, 16 Feb 1997 00:22:04 -0800 (PST) Received: from spinner.DIALix.COM (localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id QAA24233; Sun, 16 Feb 1997 16:21:45 +0800 (WST) Message-Id: <199702160821.QAA24233@spinner.DIALix.COM> X-Mailer: exmh version 2.0gamma 1/27/96 To: Mike Pritchard cc: bde@zeta.org.au (Bruce Evans), ache@freefall.freebsd.org, cvs-all@freefall.freebsd.org, CVS-committers@freefall.freebsd.org, cvs-sys@freefall.freebsd.org Subject: Re: cvs commit: src/sys/sys types.h In-reply-to: Your message of "Sat, 15 Feb 1997 23:02:18 PST." <199702160702.XAA16636@freefall.freebsd.org> Date: Sun, 16 Feb 1997 16:21:44 +0800 From: Peter Wemm Sender: owner-cvs-sys@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Mike Pritchard wrote: > Bruce Evans wrote: > > > > >ache 97/02/15 14:26:30 > > > > > > Modified: sys/sys types.h > > > Log: > > > Bump default FD_SETSIZE from 256 to 1024 as many modern > > > systems do nowdays (like SunOs 5.5.1 f.e.) > > > 256 is too small under real network load > > > > Please back it out. The kernel is not read for this. It always rounds up > > to a multiple of FD_SETSIZE bits. This will clobber old applications. > > It also bzeros a multiple of 6 * FDSETSIZE bits. This will take a > > fairly long time with a large FD_SETSIZE. It already takes too long. > > All this is because select() is specified to handle `struct fd_set's. > > The kernel handles whole objects. > > Sounds like it is time for us to implement the poll system > call. I noticed that OpenBSD has already done so. So have I.. But then again, you're all probably sick of hearing that by now :-] I note that NetBSD have gone further and removed the file/vnode/devsw select entries and replaced them with poll entries. select() and poll() are both system calls, but they both share the same device driver/etc hooks. There is a good reason why NetBSD have done this.. The poll device driver interface is a superset of select's hooks. By implementing poll() in terms of select's kernel/driver backends, you loose a lot of information (poll does much more than testing for read/write/error). But implementing the select syscall with the poll kernel/driver back ends looses no information in either case and is virtually for free. The problem is that it's a device driver interface change. I'd like to follow NetBSD here, the major changes in 3.0 (eg: lite2, smp, etc) are probably the best chance we'll get to do it while containing the chaos in one single jump. It's not application visible except that a poll implementation would be much more accurate and useful. Since select/poll backends go through the fileops and VOP tables, it'd be relatively simple to be able to poll() a directory file so that you wake up when a file is created or deleted in it, or to poll() a file for a size change etc. This requires the poll backend - the select backend only does read/write polling and cannot easily be extended. The select backends are defined to take a "which" argument of 0 (exception/error), FREAD or FWRITE. A poll backend takes a "which" argument that is interpreted as a bit mask so that you can do fancy stuff like select for readable urgent data, not just "readable" then try to determine whether it was urgent or not. To implement polling on state/size changes on files or directories, ufs_poll() would replace ufs_select() (currently returns true), and would actually do something useful. Also, bzero(6*FD_SETSIZE...) etc can probably be fixed without too much pain for select() if somebody's prepared to get blood on their hands. The incoming bits do not appear to need to be zeroed (copyin overwrites it with a rounded up to byte boundary and counts the bits that it's scanning in selscan), and only the number of bytes that are going to be copyout'ed need to be zeroed. Also, aren't you misreading the roundup? /* The amount of space we need to allocate */ ni = howmany(roundup2 (uap->nd, FD_SETSIZE), NFDBITS) * sizeof(fd_mask); if (ni > p->p_selbits_size) { /* realloc p_selbits */ } ...later... /* The amount of space we need to copyin/copyout */ ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask); #define getbits(name, x) \ if (uap->name && \ (error = copyin((caddr_t)uap->name, (caddr_t)ibits[x], ni))) \ goto done; getbits(in, 0); getbits(ou, 1); getbits(ex, 2); Note the reuse of 'ni'. The application is only exposed to sizeof(fd_mask) (fd_mask == unsigned long) rounding, not FD_SETSIZE. The FD_SETSIZE rounding is only to size the initial workspace to save excess reallocation calls to the kernel allocator. Even then it expands by 32 bytes (256 bits), not FD_SETSIZE. Also, it seems to me that the code is not very optimal. I'm probably about to embarress myself here, but wouldn't it be better something like this?: Change: /* * This buffer is usually very small therefore it's probably faster * to just zero it, rather than calculate what needs to be zeroed. */ bzero (p->p_selbits, p->p_selbits_size * 6); /* The amount of space we need to copyin/copyout */ ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask); to something more like this: /* The amount of space we need to copyin/copyout */ ni = howmany(uap->nd, NFDBITS) * sizeof(fd_mask); if (ni > 8) { /* 64 bits - arbitary guess at tradeoff */ /* * Clear the entire return-to-user buffer space */ bzero (p->p_selbits + (i + 3) * p->p_selbits_size, p->p_selbits_size * 3); } else { for (i = 0; i < 3; i++) bzero(obits[i], ni); } The idea was to figure out where the cost of three function calls versus unnecessary clearing of chunks of memory pays off. Urk... Enough rambling from me.. > > Bruce > > > > > -- > Mike Pritchard > mpp@FreeBSD.org > "Go that way. Really fast. If something gets in your way, turn" Cheers, -Peter