From owner-freebsd-hackers Wed Mar 12 20:50:52 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id UAA04103 for hackers-outgoing; Wed, 12 Mar 1997 20:50:52 -0800 (PST) Received: from caipfs.rutgers.edu (root@caipfs.rutgers.edu [128.6.37.100]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA04076 for ; Wed, 12 Mar 1997 20:50:22 -0800 (PST) Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5]) by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id XAA08031; Wed, 12 Mar 1997 23:50:11 -0500 (EST) Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4) id XAA21796; Wed, 12 Mar 1997 23:49:59 -0500 Date: Wed, 12 Mar 1997 23:49:59 -0500 Message-Id: <199703130449.XAA21796@jenolan.caipgeneral> From: "David S. Miller" To: ccsanady@nyx.pr.mcs.net CC: hackers@FreeBSD.ORG In-reply-to: <199703130435.WAA11627@nyx.pr.mcs.net> (message from Chris Csanady on Wed, 12 Mar 1997 22:35:45 -0600) Subject: Re: Solaris TPC-C benchmarks (with Oracle) Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Date: Wed, 12 Mar 1997 22:35:45 -0600 From: Chris Csanady Ok, this is pretty much as I thought. But is it worth it to do more complicated memory management, or just eat the wasted space of fixed size buffers? I mean, it won't waste any more space than mbuf clusters do for ethernet. If your using ATM, or HIPPI, you can afford the extra memory. :) Yes it does pay off, this allows the driver (the person who "knows" best how to manage the packets it services) to perform the mangement in the most efficient way possible. Nicely, if you notice (which you undoubtedly will) that many drivers manage things in an extremely similar fashion, you can write "pbuf generic" methods which the drivers share when they don't do anything overly special with buffer management. I was curious about this--in his slides, he mentions that sockbuf's go away. :\ Can you elaborate more on whats going on? Watch the output control path, it just jams it out to the driver, the code in his slides don't check for things like congestion avoidance (checking cong_window/ssthresh) You get the idea. It's not this huge problem, you just have to remember to keep the normal tcp outgoing flow control code in the path and eliminate it entirely as Jacobson seems to have done. >Secondly, his fast paths for input bank on the fact that you can get >right into user context when you detect a header prediction hit. The >only way to do this effectively on a system you'd ever want anyone to >actually run is the following: I think that the header prediction code is called from a user context, so you would already be there. Ok. You still have the issue of pbuf loaning, you have to get to user context somehow and that can take time. The real time priority trick is necessary to make that as small as humanly possible, decreasing the chance of the driver running out of receive buffers and dropping packets under high (or normal) load. pbufs would essentially be the same as mbufs to the drivers i would think--except less complicated. Right now, I think that the drivers just dma into an mbuf cluster. I don't see why it can't loan them out for a while. This would be a really nice scenerio, and I once thought the world were as simple as this. Many people set me straight ;-) You have to still allow all the stupid devices to work still, for example I know many ethernet controller supported by both Linux and FreeBSD have one packet fifo's and other strange things. You have to provide a way to tell the incoming packet code "this is a stupid device, it has a small amount if any buffering, thus copy it now and do not optimize". >This is all nontrivial to pull of. One nice effect is that you >actually then have a chance of doing real networking page flipping >with the device buffer method scheme. Does Van Jacobsons kernel to page flipping? No, but my comments were there to point out that page flipping is easier to contruct into your grand scheme of things with the device methods there. It is simply more flexible. (hint hint, think about fast routing ala CISCO's with shared memory, direct dma from card to card to forward the packet and other tricks, pbufs can allow you to do it) I thought he just did a checksum and copy to a user buffer. I remember John saying something about it being more expensive to do this than a copy, although it was in a different context. (with regard to the pipe code i think).. I dont know. If it would work, it would sure be nice, but my simple pbuf allocator would definately not work.. Every example I have ever measured either myself or has been done by someone else indicates that on just about any processor you get the checksum for free when copy/checksum is done as a combined operation. Now I would not be surprised if the overhead of checking mbuf chainage would adversely effect the efficience of this technique, but pbufs would make that issue disappear. ;-) I should have said viability rather than scalability. :) Yes that does make sense then ;-) I'd like to finish volume 2 before I even think about the timers or such.. ;-) ---------------------------------------------//// Yow! 11.26 MB/s remote host TCP bandwidth & //// 199 usec remote TCP latency over 100Mb/s //// ethernet. Beat that! //// -----------------------------------------////__________ o David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><