Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 7 Mar 1998 23:12:52 -0700 (MST)
From:      Marc Slemko <marcs@znep.com>
To:        Mike Smith <mike@smith.net.au>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: kernel wishlist for web server performance 
Message-ID:  <Pine.BSF.3.95.980307225453.2799O-100000@alive.znep.com>
In-Reply-To: <199803080554.VAA08633@dingo.cdrom.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 7 Mar 1998, Mike Smith wrote:

> > On Sat, 7 Mar 1998, Mike Smith wrote:
> > 
> > > > A sendfile() (eg. HPUX 11.x) or TransmitFile (eg. WinNT) system call. 
> ...
> > > How does this differ from mmap/AIOwrite in terms of what it actually 
> > > achieves?
> > 
> > Because you more easily do copy avoidance.  The idea is that it will copy
> > data out of the buffer cache to the network without any data copies in the
> > middle, ie. to mbufs.  Trying to implement this using traditional calls is
> > possible, but can get ugly and involves some tradeoffs.  With a
> > sendfile(), the memory for the socket buffer goes away and the kernel can
> > essentially copy data directly from the buffer cache.
> > 
> > Combine that with a file handle cache of frequently accessed files in the
> > server, and you no longer have to open and close files, and sending the
> > response is a single syscall that goes from the buffer cache to the
> > network. 
> 
> This smacks of being a grubby hack to avoid dealing with performance 
> problems in the host operating system.  

I don't think so.  Trying to do anything else is an ugly hack.  See the HP
paper I mentioned for some of the details on why.

Let me put it this way: how else do you propose to do copy avoidance to
avoid an extra copy going into the mbufs?  The data must go from the
buffer cache to the network without any copy other than to the network
card itself.  Why is your other method of doing this any less of a hack? 

> > While you can easily make a wrapper that has the same external view in
> > user space there is no point because performance sucks.  Also, many
> > systems have troubles with large write()s so you have to split them up.
> 
> Well, FreeBSD obviously isn't "most systems".  I'm curious as to 
> whether performance in that case "sucks" too - what you describe is 
> basically what an FTP server does, and wcarchive should give you some 
> idea as to how well that works already.

Just because it sucks doesn't mean it can't be fast.  Efficiency is
relative.  I don't know how David's ftp server does its stuff or how many
copies it ends up going through.

> I appreciate the conceptual niceness of what you're describing, but I 
> guess I'm not convinced that something like that would be worth the 
> cruft and effort involved.
> 
> To my mind, the biggest win is in not having to do anything about the 
> write until it has completed (or failed), and for this AIO is adequate.

Yes, although I have heard somewhat convincing arguments that on hardware
with good enough support for context switching between threads, you don't
actually gain that much (if anything) from AIO over just creating more
threads.  I'm not really convinced though, and it is highly dependent on
how the OS implements that stuff.

> Perhaps that would be better worded differently: Do you have specific
> test results that indicate that, on FreeBSD, it is unacceptably
> inefficient to mmap/AIOwrite?   Or is it the case that on some 
> platforms it is, and you want to keep Apache as simple as possible in 
> this regard (ie. have specific HTTP-transmit-file system calls 
> everywhere)?

But it isn't HTTP specific.  It is really quite generic.  In fact, the way
I hope the Apache 2.0 process and IO model comes together, it will provide
a good framework for any high performance network server.

mmap+AIO will probably be done anyway, since not all OSes support a
sendfile() type thing, so it isn't a matter of trying to avoid anything.
Any advanced IO of this form is horribly platform specific anyway.

Well, implement sendfile() and I will give you some results.  <g>  No, I
have no actual results just the comments at the start of this message
about how else do you do copy avoidance.

> 
> > > > An efficient poll().
> > > 
> > > What's inefficient about the current poll()?
> > 
> > I have no idea; haven't looked.  All I mean is a real poll that doesn't
> > just hack on top of select but is implemented the logical way for poll to
> > be done.
> 
> And what's that?  The poll(2) implementation in 3.0 is based on the 
> NetBSD poll(2), AFAIR.  It certainly has different internal semantics, 
> although the two do achieve basically the same thing.

Just that it doesn't end up going through FD_SET stuff so it doesn't have
the overhead on sparse fd sets. 

While you would be ill-advised to listen to anything they say, some MS    
docs on their TransmitFile are at:

        http://premium.microsoft.com/msdn/library/sdkdoc/wsapiref_3pwy.htm
        http://premium.microsoft.com/msdn/library/conf/html/sa8ff.htm

Note the benchmark results.  They don't mean that the technique is always
faster, since implementation has a lot to do with it, but do provide some
data. 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.980307225453.2799O-100000>