FreeBSD Mail Archives

Date:      Thu, 15 Jul 1999 20:54:23 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        davids@webmaster.com (David Schwartz)
Cc:        tlambert@primenet.com, Doug@gorean.org, scrappy@hub.org, beyssac@enst.fr, chat@FreeBSD.ORG
Subject:   Re: Known MMAP() race conditions ... ?
Message-ID:  <199907152054.NAA06945@usr07.primenet.com>
In-Reply-To: <001101becef3$7b5056b0$021d85d1@youwant.to> from "David Schwartz" at Jul 15, 99 11:54:35 am

> > The bind 8.x stuff doesn't have licensing issues; it allows multiple
> > concurrent operations.  It is merely that the 4.x resolved is wedged
> > into libc that causes your problem.
> 
> 	The bind license has 'forced speech' requirement.
> 
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the distribution.

So print it in the documentation.  If it offends you that much,
put it on a microdot, and replace the first period in the manual
with the microdot.  No "ugly" fine print...


> > > > Completion ports are no more, and no less, than VMS AST's.  Just
> > > > like aio* in FreeBSD, and much of the POSIX crap that's passing
> > > > for standards these days.
> 
> Completion ports are not about asynchronous I/O. They're not about having
> your I/O routines called automatically. They're about having the optimimum
> number of threads running.
> 
> What completion ports do is allow a group of threads to be associated such
> that if one thread blocks, another one is automatically freed. This way, on
> an N processor machine, you can nearly always have N active threads. This is
> extremely difficult to do well on pretty much every UNIX.

Solaris does it, even though I am not particularly fond of the
overhead in their implementation.  It is the right direction, but
not far enough.


> > > > They may make it easier to code, by calling your callbacks, but
> > > > the idea that network buffers should be in user space instead of
> > > > on the kernel side of the protection domain barrier is just
> > > > plain nuts.
> 
> I'm not really sure why you feel that way. If there weren't performance
> issues, I don't see why you wouldn't want the whole network stack to be in
> user space.

Domain crossing is expensive.  Both for buffer copies on the way out,
and the callback trampoline on the way in.


> > I disagree.  I can turn around reads and writes in the kernel without
> > taking the copy overhead.  NFS does this, and a KLD that could do this
> > for SAMBA and AFS, while not trivial, given the state of the FreeBSD
> > VFS interface, the protocol stack incursions for routing of the
> > specific packets, and the mbuf reference to buffer cache objects that
> > would be required, is well within the realm of possibility.  If
> > anything, the ability to use the GPL'ed SAMBA code in the Linux
> > kernel may well be the deciding factor for eventual FreeBSD vs.
> > Linux SMB performance, the same as it was for Linux vs. FreeBSD NFS
> > performance.
> 
> That is true. But it's really not relevant to the current discussion. But I
> still think it's better to pull everything into user space to avoid copies
> rather than push everything into kernel space.

You can have, at absolute best, one copy I/O: DMA from the disk
controller into the ethernet card memory.

Practically, this works out to two copy I/O: DMA from the disk
controller into memory, and DMA from memory into the ethernet
controller.  This is because of numerous issues, including the
locality of reference which makes RAM caching a good idea, and
the fact that packet and page boundaries are not the same thing.

To achive this, you must be able to mage a vm_object_t reference
in an mbuf header, rather than copying to the mbuf data area from
the buffer cache (VM) and passing that (e.g. a three copy I/O).

When you pull data into user space, you require a protection domain
crossing.  If you send it anywhere, you require another.

You can, at some expense, avoid this.  Generally, this is done by
mmap(2)'ing the file so that when the addresses of data in the file
is passed to the kernel for a write(2)/send(2) call, there is
merely an address translation before the kernel can access the
(its own) data.  The expense is in the mapping and unmapping, and
may be ameliorated by doing lazy unmapping (again, for reasons of
locality of reference).

For an HTML server, it is likely that a "sendextentvectors" call
would be most useful; this would operate by having an mmap'ed file
into which HTML headers are written, and another mmap'ed file
that contains the static body that does not represent HTTP
server generated data.  Both would be passed, with length
argements, to avoid truncation overhead for variant header
sizes.

This approach would avoid all avoidable copies (since in FreeBSD
the VM and buffer cache are unified, there is no need to deal
with VM/buffer cache coherency issues, so long as the initial
size of the VM backing for the header region were always
sufficient).

A kernel module could easily be written to add this system call
to any modern UNIX derivative or clone, on a per system basis
(I did a shared descriptor table implementation on AIX, SunOS,
Solaris, and UnixWare for a product, so I have historical proof
of concept).


This particular call *might* be useful for SAMBA as well (but not
as useful as turning the reads and writes around in the kernel).


Kernel space implicitly avoids copies if you have to do I/O;
that's what servers do, for the most part.  If you are just
diddling data in user space, then that's a different issue
altogether.  This is why microkernel OS's like CHORUS have
abandoned the MACH protection domain barriers between
single servers which need to do I/O and are trusted (the use
of statistical memory protection can save the kernel from
buggy servers, here).



					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907152054.NAA06945>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation