Date: Mon, 9 Mar 1998 02:11:58 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: marcs@znep.com Cc: mike@smith.net.au, hackers@FreeBSD.ORG Subject: Re: kernel wishlist for web server performance Message-ID: <199803090211.TAA13723@usr08.primenet.com> In-Reply-To: <Pine.BSF.3.95.980307225453.2799O-100000@alive.znep.com> from "Marc Slemko" at Mar 7, 98 11:12:52 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> I don't think so. Trying to do anything else is an ugly hack. See the HP > paper I mentioned for some of the details on why. > > Let me put it this way: how else do you propose to do copy avoidance to > avoid an extra copy going into the mbufs? The data must go from the > buffer cache to the network without any copy other than to the network > card itself. Why is your other method of doing this any less of a hack? Put the header in a page by itself and butt it up against the data. We did this for the Pathworks for VMS (NetWare) product. The copy to the card buffer is gathered. Basically, you are saving a direct reference to the mbuf by using an indirect reference to the data you want to send in the buffer cache (or in the VM, more correctly). I don't see why you couldn't do this in the mmap/write case, frankly, without needing to use a "TransmitFile" type operation. In either case, you would need to modify the networking so that buffers could be accessed by indirect reference, and usage counts are done appropriately on copy-complete/DMA complete (depending on how the data gets gathered to the network card). The *one* thing you will save is per-write user to kernel address translation. To get this, you will add the complication of needing to (potentially) take a page fault on the backing vp for the file to get more pages to send, if the whole thing isn't in core. You could do this in the write(2) case by having a faulting uio routing (uiofault instead of uiomove?). You would also ned a callback function to unlock the pages after the DMA/copy -- you couldn't simply release things. > Yes, although I have heard somewhat convincing arguments that on hardware > with good enough support for context switching between threads, you don't > actually gain that much (if anything) from AIO over just creating more > threads. I'm not really convinced though, and it is highly dependent on > how the OS implements that stuff. You would need quantum-thread-group-affinity to beat AIO. Specifically, if I made a blocking operation on a kernel thread after using 1/3 of my quantum, I'd want the remaining 2/3rd's to go to another kernel thread within my process (thread group), in order to save the page mapping change overhead part of the context switch. This leads to all sorts of starvation issues, etc., which can be dealt with, but aren't easy. You can get similar performance from using user space threads in combination with AIO, without hving to implement the affinity. For SMP, there are additional L1 cache issues. You want thread CPU affinity so you don't blow out your L1 cache acessing thread local storage, and you want process CPU affinity so you don't blow it out accessing heap data. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803090211.TAA13723>