Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Aug 1997 22:21:18 -0500
From:      Chris Csanady <ccsanady@friley01.res.iastate.edu>
To:        Terry Lambert <terry@lambert.org>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: DISCUSS: interface for raw network driver.. 
Message-ID:  <199708120321.WAA04402@friley01.res.iastate.edu>
In-Reply-To: Your message of "Mon, 11 Aug 1997 10:19:59 PDT." <199708111719.KAA15515@phaeton.artisoft.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

>> I am writing a pseudo device that basically sits on top of a network driver,
>> does memory management, and exports the buffers to user-space.
>
>[ ... ]
>
>> A program will use it by opening the special file, and mmapping the
>> whole chunk of device memory.  It will then create an endpoint,
>> specifying the foreign address, port number, queue length
>> restrictions, etc.  Now it can do IO by specificifying offset, size,
>> and VCI. (Currently, I am using a series of ioctls for this.  They
>> include allocation, freeing, sending, etc.. of endpoints and buffers)
>
>Basically, the inverse of mmap'ing a file, and then sending an
>address in the mmap'ed region to avoid the copy overhead.
>
>I think the  main difference I see is that this would allow the
>buffers to be non-file buffers, so for generated rather than raw
>file contents, you save a copy, but for files, you are still one
>user/kernel space copy over the minimum possible.

I think so.  I'm not sure how it would be the inverse of mapping a file
though.  You are correct though that there would be an extra copy involved
for files.  I'm hoping to use it primarily for parallel applications, so
generated contents are of more importance.  I'm not sure how you would
make file io better as well..

>> The obvious drawback is that anyone using this interface will be
>> able to trash eachothers buffers, but this seems a reasonable
>> compromise.  With only one page in use, there will be no TLB thrashing,
>> and no overhead for vm mappings.  Ideally, it will be as close as
>> possible to providing a user level network driver without comprimising
>> overall system integrity.  Our initial use will be providing high
>> performance, low latency communication for our cluster.
>
>[ ... ]
>
>> Initially, this arhitecture will be used for gigabit and fast ethernet,
>> although if there are any glaring problems which would prevent use on
>> other network architectures, I would like to know.  Even with ethernet
>> however, it will allow use of non-standard frame sizes on hardware
>> which supports it, and will be a huge win.
>> 
>> Thoughts?
>
>What about establishing an anonymous memory mapping?
>
>The mmap'ing has long been used to save the copy overhead, by causing
>the uiomove() to take the transfer fault in kernel space (the user
>space reference only exists to get a DT entry of some kind).

DT?

>If you mapped a file into a process (specifically, grabbed some
>anonymous pages from /dev/zero), it seems to me that it would
>provide the same facilities, without the danger that someone else
>could stomp on your memory region which was pinned in the KVA space.

This is a good point.  Although there are a few issues.  First of all,
If I just grabbed pages from zero, they wouldn't be contiguous.  This
makes it more difficult in that the driver will have to use the physical
addresses when dma'in in or out--which means scatter gather to an
arbitrary number of segments.  Also, unless only one process were
allowed access to the device at a time, you couldn't know which segment
an arbitrary packet should be dma'ed into.  This may be a reasonable
restriction though.

>Obviously, if the mmap() stuff was fixed, the kernel mapping
>would go away, so this might be a consideration.

The above is mainly why all the buffers come straight from the kernel.
It is the only way to allocate a contiguous chunk of memory upon boot.

>On the other hand, the per-process descriptor table must be present
>for your process to be running anyway, and it's unlikely to be
>LRU'ed out -- especially if, on I/O request, you pin the pages
>and have the completion unpin the pages?

The general idea is to have a dual mapping throughout the entire time
that you are using the device.

>I think the remaining issue is the number of DT entries which need
>to be rewritten on context switch.  With your method, you add one,
>but this is an artifact of the 4M page size -- you can get the
>same effect by using a small region, and/or by making mmap() use
>4M pages as well (a good optimization in any case).

Well, I really dont understand how using 4MB pages at the same time as
4K ones, so i can't comment.  It sounds nice though.. :)

-Chris

>Is there anything I've missed?
>
>I'm not sure it's necessary to go to the driver level an push the
>mapping up from the kernel instead of down from user space... ?
>
>
>					Regards,
>					Terry Lambert
>					terry@lambert.org
>---
>Any opinions in this posting are my own and not those of my present
>or previous employers.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708120321.WAA04402>