From owner-freebsd-hackers Mon Jul 3 11:32:41 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 3F60737B532 for ; Mon, 3 Jul 2000 11:32:34 -0700 (PDT) (envelope-from bmilekic@dsuper.net) Received: from modemcable009.62-201-24.mtl.mc.videotron.net ([24.201.62.9]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0FX4009M6U9XEC@falla.videotron.net> for freebsd-hackers@FreeBSD.ORG; Mon, 3 Jul 2000 13:37:10 -0400 (EDT) Date: Mon, 03 Jul 2000 13:39:13 -0400 (EDT) From: Bosko Milekic Subject: Re: mbuf re-write(s), v 0.1 In-reply-to: <200007030820.BAA09516@implode.root.com> X-Sender: bmilekic@jehovah.technokratis.com To: David Greenman Cc: freebsd-hackers@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 3 Jul 2000, David Greenman wrote: > What I'm doing is challenging your assertions that spending CPU cycles to > save memory in the networking code is the right thing to do. I'm further > saying that I have direct experiance in this area since I'm one of the primary > people in FreeBSD's history that have spent major amounts of effort in > improving its performance, especially in the networking area. We (actually > John Dyson and I) made a conscience decision to waste memory in trade for > performance and if we (FreeBSD developers in general) decide to go in the > opposite direction, then it sure ought to be well thought out and have solid > reasoning behind it. In our discussions so far, I haven't yet seen any real > numbers to back up the claims. What is needed is: > 1) Some numbers that show > that the memory wastage is significant - and I'm talking about multiple > megabytes at least. If its not 'significant' by that definition (and in my > experiance it isn't), than I'd like to hear why you think much smaller numbers > are significant. When I posted the initial diff, I provided such data. I'll repeat: a good example is at: http://24.201.62.9/stats/mbuf.html - specifically, look at the last graph at the bottom. What happened in Weeks 20 and 22 was the result of (simulated) very high web server and NFS activity, combined with a temporary DoS attack that occurred at the same time. On a machine with activity such as that depicted in these statistics, I would set min_on_avail to about 360. This way, the system will allocate at least 360 mbufs from the map and will not free pages back to the map once it hits 360 mbufs on the free lists. Note that during Week 22, the system had allocated around 5.5k mbufs, thus a total of 1408000 bytes (~1.4M). If "normal" activity for this system is ~360 mbufs (it's actually a little less than that), then we're looking at 92160 bytes. 1408000-92160 = 1315840 (~1.3M) of wasted memory, which is around 322 _wired_ pages on my machine. On a machine such as one that is one of my NFS and Samba servers, all that is available are 8M of RAM, and this would leave me with only ~7M to work with. But regardless of the amount of RAM the machine in question has, note that in this case the system is actually _WASTING_ ~1427.78% the memory that it normally would use during "regular high" activity. That's the way I look at it, and obviously -- I agree with you -- if you consider that memory is cheap and that because of that, you are prepared to literally throw some of it away, then why should you even be considering these propositions? Well, if you're looking at designing a system that will scale and give back when it no longer requires, and who's behavior in doing so can be adjusted at runtime, then the present allocator just doesn't suffice. > 2) I'd like to see some more numbers that show that the > additional CPU wastage is very minimal (say less than 1% of the total amount > of time spent doing the allocs/frees). As I also previously mentionned, I had some trouble getting profiling to work for me here (and in fact, I'm still having trouble). I can build a profiling kernel, but it simply won't boot (the system becomes unresponsive when the "/" appears at boot) [this is on -CURRENT] Although I have to post some updated diffs, MGET(), with the modifications, results in the following: * Check if free list is empty, if not (which is usually the case if you adjust min_on_avail properly and have allowed the server to stabilize itself -- e.g. allocate at least min_on_avail from the map), then it will setup a pointer to the new mb_map page descriptor structure at the top of the list and extract the pointer to the chain of free mbufs. It will remove the first mbuf on the chain, while making sure that the others are re-attached properly (this part is essentially what was done with the mmbfree pointer manipulation when removing an mbuf from the chain). Finally, before completing the allocation, it will simply check whether the page descriptor structure entry from which it allocated has now reached zero mbufs, and if it's the case, it will just move that entry to the "empty" list. So the extra CPU cycles are spent in dealing with the two lists that the system must now manage to ensure that it can easily keep track of what mbufs belong to what allocated page, so that it knows when it's time to free the given page -- if necessary. As you know, MGETHDR() is similar. As for MFREE(), here's what it does following the suggested proposition: * If there is external storage, free it (same as always). Place successor into second provided mbuf (same as always). There is a new field in the m_hdr struct in mbufs (pointer) that points to the mbuf's corresponding page descriptor structure, so that pointer is aquired and the free mbuf chain is extracted from the structure to which the freed mbuf is attached (as it always was). I guess the only real addition in CPU cycles here is the following: a simple check was added that just checks if the entry is on the "empty" list and if it is, moves it over to the "free list." If that's not the case, then there is a possibility that the freed mbuf completes a page and the page can be freed, so if that's the case and min_on_avail allows it, then the page is freed back to the map (notice that this behavior is tunable - again - with min_on_avail). > I'm not trying to 'frown upon evolution', unless the particular form of > evolution is to make the software worse than it was. I *can* be convinced > that your proposed changes are a good thing and I'm asking you to step up > to the plate and prove it. That sounds fair. > > -DG > > David Greenman > Co-founder, The FreeBSD Project - http://www.freebsd.org > Manufacturer of high-performance Internet servers - http://www.terasolutions.com > Pave the road of life with opportunities. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 bmilekic@technokratis.com * http://www.technokratis.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message