From owner-freebsd-hackers  Mon May 24 23:46:14 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from dingo.cdrom.com (castles502.castles.com [208.214.165.66])
	by hub.freebsd.org (Postfix) with ESMTP id D3B4414D0D
	for <freebsd-hackers@FreeBSD.ORG>; Mon, 24 May 1999 23:46:11 -0700 (PDT)
	(envelope-from mike@dingo.cdrom.com)
Received: from dingo.cdrom.com (localhost [127.0.0.1])
	by dingo.cdrom.com (8.9.3/8.8.8) with ESMTP id XAA00833;
	Mon, 24 May 1999 23:43:51 -0700 (PDT)
	(envelope-from mike@dingo.cdrom.com)
Message-Id: <199905250643.XAA00833@dingo.cdrom.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Christopher Sedore <cmsedore@mailbox.syr.edu>
Cc: Mike Smith <mike@smith.net.au>,
	Zhihui Zhang <zzhang@cs.binghamton.edu>, freebsd-hackers@FreeBSD.ORG
Subject: Re: mmap of a network buffer 
In-reply-to: Your message of "Fri, 21 May 1999 16:46:10 EDT."
             <Pine.SOL.3.95.990521163524.17627E-100000@rodan.syr.edu> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 24 May 1999 23:43:50 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > > I really do not know how to describe the problem. But a friend here asks
> > > me how to mmap a network buffer so that there is no need to copy the data
> > > from user space to kernel space. We are not sure whether FreeBSD can
> > > create a device file (mknod) for a network card, and if so, we can use the
> > > mmap() call to do so because mmap() requires a file descriptor.  We assume
> > > that the file descriptor can be acquired by opening the network device.
> > > If this is infeasible, is there another way to accomplish the same goal?
> > 
> > Use sendfile() for zero-copy file transmission; in all other cases it's 
> > necessary to copy data into the kernel.  Memory-mapping a network 
> > buffer makes no sense if you just think about it for a moment...
> > 
> > There's also very little need for this under "real" circumstances; some 
> > simple tests have demonstrated we can sustain about 800Mbps throughput 
> > (UDP), and the bottleneck here seems to be checksum calculations, not 
> > copyin/out.
> > 
> 
> Oddly enough, I was just getting ready to implement something like this. 
> Not because of copyin performance issues, but because async io for sockets
> could be done better if I didn't have to do a copyin.  copyin has to have
> curproc==(proc with the buffer from which to copy)

That's basically right.  You have three options:

 - Switch to process context to access process data; this allows you to 
   take page faults in controlled circumstances (eg. copyin).
 - Wire the process' pages into the kernel so you don't have to fault.
 - Copy the user data into kernel space in an efficient fashion.

> which means that I have
> to do a context switch for every socket buffer sized chunk (best case) or
> every io op (worst case).

It sounds like your buffering is not efficient.

> My hope was to map the user's buffer into kernel space so that I could do
> event driven io on the socket without having to context switch to an aiod
> for every io operation.  Is this really a bad idea?  I am a little
> concerned about running out of kernel address space, but I don't think
> that's an immediate problem.

If you map into the kernel, you still have to context switch unless you 
wire the data down.  Excessive wiring can be expensive.  Have a look at
how physio() does it's thing.

> Such an implementation would lend itself to doing zero-copy writes async
> writes with some frobbing of the send routines.  It would also bypass some
> of the messing around done to do socket buffers--that is, there would not
> be a limit per se on socket buffering for writes since they would be
> mapped user space.   One might want to put arbitrary limits in place to
> ensure that an unreasonable amount of memory isn't locked.
> 
> Thoughts? 

Sounds a lot like sendfile.  See if you can't improve on it to do eg. 
sendmem().

-- 
\\  The mind's the standard       \\  Mike Smith
\\  of the man.                   \\  msmith@freebsd.org
\\    -- Joseph Merrick           \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message