From owner-freebsd-hackers  Tue May 25 19:41:50 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from mailer.syr.edu (mailer.syr.edu [128.230.18.29])
	by hub.freebsd.org (Postfix) with ESMTP id 856F114E01
	for <freebsd-hackers@FreeBSD.ORG>; Tue, 25 May 1999 19:41:44 -0700 (PDT)
	(envelope-from cmsedore@mailbox.syr.edu)
Received: from rodan.syr.edu by mailer.syr.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.C662FD40@mailer.syr.edu>; Tue, 25 May 1999 22:41:48 -0400
Received: from localhost (cmsedore@localhost)
	by rodan.syr.edu (8.8.7/8.8.7) with SMTP id WAA26232;
	Tue, 25 May 1999 22:41:42 -0400 (EDT)
X-Authentication-Warning: rodan.syr.edu: cmsedore owned process doing -bs
Date: Tue, 25 May 1999 22:41:42 -0400 (EDT)
From: Christopher Sedore <cmsedore@mailbox.syr.edu>
X-Sender: cmsedore@rodan.syr.edu
To: Mike Smith <mike@smith.net.au>
Cc: Zhihui Zhang <zzhang@cs.binghamton.edu>,
	freebsd-hackers@FreeBSD.ORG
Subject: Re: mmap of a network buffer 
In-Reply-To: <199905250643.XAA00833@dingo.cdrom.com>
Message-ID: <Pine.SOL.3.95.990525223020.24886A-100000@rodan.syr.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Mon, 24 May 1999, Mike Smith wrote:

> > > There's also very little need for this under "real" circumstances; some 
> > > simple tests have demonstrated we can sustain about 800Mbps throughput 
> > > (UDP), and the bottleneck here seems to be checksum calculations, not 
> > > copyin/out.
> > > 
> > 
> > Oddly enough, I was just getting ready to implement something like this. 
> > Not because of copyin performance issues, but because async io for sockets
> > could be done better if I didn't have to do a copyin.  copyin has to have
> > curproc==(proc with the buffer from which to copy)
> 
> That's basically right.  You have three options:
> 
>  - Switch to process context to access process data; this allows you to 
>    take page faults in controlled circumstances (eg. copyin).
>  - Wire the process' pages into the kernel so you don't have to fault.
>  - Copy the user data into kernel space in an efficient fashion.

Glad to know that my understanding wasn't too far off-base.

> > which means that I have
> > to do a context switch for every socket buffer sized chunk (best case) or
> > every io op (worst case).
> 
> It sounds like your buffering is not efficient.

Well, I'd be happy if I could be convinced that were the problem, but
protocols like HTTP and NNTP which have short, rapid-fire (sometimes
lock-step) command sequences don't help the buffering any.  This means
that reading commands of an incoming connection causes many, many context
switches between an aiod and the main process doing async io.

On the outgoing side, in the optimal case of sending large blocks of data,
I don't have control over the buffering--the aiod essentially impersonates
my process, going to sleep in the socket write routines, context switching
to it when a copyin is necessary.  I could exert more control by metering
my writes so that they fit into socket buffers and avoid the switches, but
that increases the number of system calls, so I'm not sure how much of a
win it ends up.  Plus, I hope that I will get some added advantage out of
constructing a zero-copy write (not that I've had any throughput
troubles).

> > My hope was to map the user's buffer into kernel space so that I could do
> > event driven io on the socket without having to context switch to an aiod
> > for every io operation.  Is this really a bad idea?  I am a little
> > concerned about running out of kernel address space, but I don't think
> > that's an immediate problem.
> 
> If you map into the kernel, you still have to context switch unless you 
> wire the data down.  Excessive wiring can be expensive.  Have a look at
> how physio() does it's thing.

Will do.  There's some of that code in the async io routines now, for
dealing with raw io operations--I hoped to borrow from that to implement
my stuff.

> > Such an implementation would lend itself to doing zero-copy writes async
> > writes with some frobbing of the send routines.  It would also bypass some
> > of the messing around done to do socket buffers--that is, there would not
> > be a limit per se on socket buffering for writes since they would be
> > mapped user space.   One might want to put arbitrary limits in place to
> > ensure that an unreasonable amount of memory isn't locked.
> > 
> > Thoughts? 
> 
> Sounds a lot like sendfile.  See if you can't improve on it to do eg. 
> sendmem().

Yes.  I'd like mine to be async rather than synchronous, though.  I've
considered creating an async sendfile too.  (Actually, I've been thinking
about extending the async io code to allow calling any syscall async, but
there are other complexities there...).  

-Chris


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message