From owner-freebsd-hackers Sun Mar 8 12:32:35 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA07974 for freebsd-hackers-outgoing; Sun, 8 Mar 1998 12:32:35 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from scanner.worldgate.com (scanner.worldgate.com [198.161.84.3]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA07962 for ; Sun, 8 Mar 1998 12:32:25 -0800 (PST) (envelope-from marcs@znep.com) Received: from znep.com (uucp@localhost) by scanner.worldgate.com (8.8.7/8.8.7) with UUCP id NAA28534; Sun, 8 Mar 1998 13:32:13 -0700 (MST) Received: from localhost (marcs@localhost) by alive.znep.com (8.7.5/8.7.3) with SMTP id NAA07474; Sun, 8 Mar 1998 13:30:25 -0700 (MST) Date: Sun, 8 Mar 1998 13:30:25 -0700 (MST) From: Marc Slemko To: Chris Csanady cc: hackers@FreeBSD.ORG Subject: Re: kernel wishlist for web server performance In-Reply-To: <199803082017.OAA03298@friley585.res.iastate.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 8 Mar 1998, Chris Csanady wrote: > > > > >>On Sat, 7 Mar 1998, Julian Elischer wrote: > >> > >>> > > >>> > Let me put it this way: how else do you propose to do copy avoidance to > >>> > avoid an extra copy going into the mbufs? The data must go from the > >>> > buffer cache to the network without any copy other than to the network > >>> > card itself. Why is your other method of doing this any less of a hack? > >>> [...] > >>> I would like to add here that in FreeBSD's unified VM/Buffer cache, > >>> a mmapped file IS the buffer cache > >>> so that a send() from an mmapped file IS copying direct fromt he buffer > >>> cache. Ther eis ONE copy.. that from the buffer cache, into the mbuf. > >> > >>But the point is that you still have to copy it into the mbuf, you still > >>have to use the memory for the mbuf, etc. This uses more CPU and memory > >>bandwidth, increases memory use, and means you may have to chop things up > >>smaller to avoid using too much memory for mbufs. > > > >And unless you introduce a special hack for each case that comes up, > > Ugh.. Major slip of the mouse here, let me finish this up. > > I don't believe that this can be done with no extra copies without the > sendfile(). Network cards impose way too many restrictions on memory, It is difficult, yes. It is, however, possible. For example, mmap and write on Solaris 2.6 with their OC-12 ATM card (I think that's the one) will do zero copy if you have things aligned right and do things in the right sized chunks. HPUX has been able to do it for a long time with one of their FDDI cards, and possibly other NICs by now. Using sendfile() doesn't make it magically possible either, just makes it a bit easier and cleaner, both for the application and the kernel. There are still ugly parts to deal with. However, this _is_ a worthwhile thing to be able to do. It does make a significant difference in performance on other systems that I have seen benchmarked, and I see no reason to think FreeBSD shouldn't be the same because it does things they same way. Yes, it is dependent on the NIC. There may not be the demand for this in FreeBSD required to make it happen. Unless some company really needs it and pays to have it implemented, I'm not expecting it to really happen. It is, however, one of the useful features that I do desire in a kernel for highest-performance web serving and which Apache 2.0 will almost certainly be able to take advantage of. > and alignment for this to be possible otherwise. Unless the buffer > cache knows in advance which net interface the buffer is leaving through, > this would not be possible. But then, you would make this data useless > for much else. I do not see a generic way to accomplish this short of > tangling all sorts of things, and using hacks such as a sendfile(). :( > > Not that I think this should be done or not.. I'll leave that to someone > else. :) > > Chris > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message