From owner-freebsd-arch@FreeBSD.ORG Fri May 30 02:55:18 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 69FAA37B401 for ; Fri, 30 May 2003 02:55:18 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id C220E43FA3 for ; Fri, 30 May 2003 02:55:17 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0lu.dialup.mindspring.com ([209.86.2.190] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19LgbN-0001NN-00; Fri, 30 May 2003 02:55:10 -0700 Message-ID: <3ED72A16.9CACD4C5@mindspring.com> Date: Fri, 30 May 2003 02:53:26 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Igor Sysoev References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a427793c588c70b29ca754cb1c6160fb0e350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 May 2003 09:55:18 -0000 Igor Sysoev wrote: > On Fri, 30 May 2003, Terry Lambert wrote: > > Or you could just fix sendfile. 8-). > > I'm going to fix it as Matthew Dillon suggested if no one else is going to > do it in the near future. I'm pretty sure it will deadlock on boundary conditions, but Matt has confidence it won't; I looked at the code he pointed to in -stable in -current, and I'm not so sure I agree, but I'm willing to be wrong. If it fixes the problem for you, and doesn't deadlock, then more power to you. I would ask that you test with files sizes in 1 byte increments, up to 32769 bytes, with headers of 0 bytes and 300 bytes for your test cases, so that the boundary that I'm worried about ends up getting exercised. > > > By the way what's about kqueue(2) ? Are you not confused that NetBSD > > > does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and > > > EVFILT_TIMER ? Does this mean that FreeBSD should not introduce any > > > new kqueue filters or flags ? > > > > These are incredibly trivial to support. I estimate the work > > at an hour each, including writing a unit test. It took me > > about an hour to write the SystemV IPC Message Queue KNOTE() > > code for FreeBSD. > > Nevetheless there's no support for EVFILT_AIO and EVFILT_TIMER. > By the way I do not think that EVFILT_AIO is a trivial thing. > Actually it requires at least the working AIO enviroment in the kernel. This is really a tangent again; however, I would point out that aio can be implemented in the context of sceduler activations and a spawned AIO kernel thread per request (the alternative is to implement it entirely in user space, and then implement a loopback "send" mechanism for the KNOTE()'s). So implementing aio is probably a 20 hour task (1/2 a man-week). More work, but still all doable in a weeks time or less. In general, most of the things you are pointing at, including the sendfile problem, don't take a lot of thinking to fix, only the grunt-work to actually crank out the code. > Now we have more portable kqueue() that exists in FreeBSD, NetBSD, and OpenBSD > (I do not know about Darwin and MacOS X) with the same prototype and > some unsupported filters. And we have much less portable sendfile() that > exists in the most modern unices but with the different prototypes and > functionality. This illustrates my thesis that interfaces with the same names tend to converge over time. Another example is select(), which Linux initially implemented as updating the timeout struct with the time which had elapsed; this was divergents, and broke a lot of code, until they relented and fixed it to defacto standard behaviour. I'm confident the same thing will eventually happen with kqueue/kevent. The main issue with Linux adoption of kqueue/kevent is that they claim it's level triggered instead of edge triggered, that they want events, they don't want conditions raised. To a small extent, they are right. But this is trivially correctable, and needs to be corrected anyway, for EVFILT_PROC to support a larger numbr of PID's. Right now, the PID is OR'ed in with the event, and so is limited to 20 bits. Another parameter, a void * (in which the PID value can be cast and recovered) would be enough to provide additional context. With this context, it's possible to arrange a contract between the user kn_data that was passed in and the filter routine, in order to copy out arbitrary data, making the event edge rather than level triggered. With this single modification, you fix both the 20 bit PID limit problem and the Linux objection to the adoption of the kevent interface. In other words, you increase convergence. It's natural over time for visible source bases to converge. > > It doesn't "read" it, per se: it creates a mapping, and it > > faults the pages; when they are in core, then they can be > > sent. > > So what do these lines in /sys/kern/uipc_syscalls.c:sendfile(): > > if (!pg->valid || !vm_page_is_valid(pg, pgoff, xfsize)) { > .... > error = VOP_READ(vp, &auio, IO_VMIO | ((MAXBSIZE / bsize) << 16), > p->p_ucred); > .... > } That's easy: they mean you aren't looking at version 1.147 of the file, and that you're looking at RELENG_4, and not -CURRENT (version 1.65.2.17, or earlier). You are 82 HEAD revisions behind the state of the art. -- Terry