From owner-freebsd-hackers  Mon Jul  3 11:32:41 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id 3F60737B532
	for <freebsd-hackers@FreeBSD.ORG>; Mon,  3 Jul 2000 11:32:34 -0700 (PDT)
	(envelope-from bmilekic@dsuper.net)
Received: from modemcable009.62-201-24.mtl.mc.videotron.net ([24.201.62.9])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0FX4009M6U9XEC@falla.videotron.net> for freebsd-hackers@FreeBSD.ORG; Mon,
 3 Jul 2000 13:37:10 -0400 (EDT)
Date: Mon, 03 Jul 2000 13:39:13 -0400 (EDT)
From: Bosko Milekic <bmilekic@dsuper.net>
Subject: Re: mbuf re-write(s), v 0.1
In-reply-to: <200007030820.BAA09516@implode.root.com>
X-Sender: bmilekic@jehovah.technokratis.com
To: David Greenman <dg@root.com>
Cc: freebsd-hackers@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0007031238150.412-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Mon, 3 Jul 2000, David Greenman wrote:

>    What I'm doing is challenging your assertions that spending CPU cycles to
> save memory in the networking code is the right thing to do. I'm further
> saying that I have direct experiance in this area since I'm one of the primary
> people in FreeBSD's history that have spent major amounts of effort in
> improving its performance, especially in the networking area. We (actually
> John Dyson and I) made a conscience decision to waste memory in trade for
> performance and if we (FreeBSD developers in general) decide to go in the
> opposite direction, then it sure ought to be well thought out and have solid
> reasoning behind it. In our discussions so far, I haven't yet seen any real
> numbers to back up the claims. What is needed is:
> 1) Some numbers that show
> that the memory wastage is significant - and I'm talking about multiple
> megabytes at least. If its not 'significant' by that definition (and in my
> experiance it isn't), than I'd like to hear why you think much smaller numbers
> are significant.

	When I posted the initial diff, I provided such data. I'll repeat: a
  good example is at: http://24.201.62.9/stats/mbuf.html - specifically,
  look at the last graph at the bottom. What happened in Weeks 20 and 22
  was the result of (simulated) very high web server and NFS activity,
  combined with a temporary DoS attack that occurred at the same time. On a
  machine with activity such as that depicted in these statistics, I would
  set min_on_avail to about 360. This way, the system will allocate at
  least 360 mbufs from the map and will not free pages back to the map once
  it hits 360 mbufs on the free lists.
  	Note that during Week 22, the system had allocated around 5.5k mbufs,
  thus a total of 1408000 bytes (~1.4M). If "normal" activity for this
  system is ~360 mbufs (it's actually a little less than that), then we're
  looking at 92160 bytes. 1408000-92160 = 1315840 (~1.3M) of wasted memory,
  which is around 322 _wired_ pages on my machine. On a machine such as one
  that is one of my NFS and Samba servers, all that is available are 8M of
  RAM, and this would leave me with only ~7M to work with. But regardless
  of the amount of RAM the machine in question has, note that in this case
  the system is actually _WASTING_ ~1427.78% the memory that it normally
  would use during "regular high" activity. That's the way I look at it,
  and obviously -- I agree with you -- if you consider that memory is cheap
  and that because of that, you are prepared to literally throw some of it
  away, then why should you even be considering these propositions? Well,
  if you're looking at designing a system that will scale and give back
  when it no longer requires, and who's behavior in doing so can be
  adjusted at runtime, then the present allocator just doesn't suffice. 

> 2) I'd like to see some more numbers that show that the
> additional CPU wastage is very minimal (say less than 1% of the total amount
> of time spent doing the allocs/frees).

	As I also previously mentionned, I had some trouble getting profiling
  to work for me here (and in fact, I'm still having trouble). I can build
  a profiling kernel, but it simply won't boot (the system becomes
  unresponsive when the "/" appears at boot) [this is on -CURRENT]
  	Although I have to post some updated diffs, MGET(), with the
  modifications, results in the following:
  
  	* Check if free list is empty, if not (which is usually the case if
	you adjust min_on_avail properly and have allowed the server to
	stabilize itself -- e.g. allocate at least min_on_avail from the
	map), then it will setup a pointer to the new mb_map page descriptor
	structure at the top of the list and extract the pointer to the chain
	of free mbufs. It will remove the first mbuf on the chain, while
	making sure that the others are re-attached properly (this part is
	essentially what was done with the mmbfree pointer manipulation when
	removing an mbuf from the chain). Finally, before completing the
	allocation, it will simply check whether the page descriptor
	structure entry from which it allocated has now reached zero mbufs,
	and if it's the case, it will just move that entry to the "empty"
	list.
		So the extra CPU cycles are spent in dealing with the two lists
	that the system must now manage to ensure that it can easily keep
	track of what mbufs belong to what allocated page, so that it knows
	when it's time to free the given page -- if necessary.

	As you know, MGETHDR() is similar.

	As for MFREE(), here's what it does following the suggested
	proposition:

	* If there is external storage, free it (same as always). Place
	successor into second provided mbuf (same as always). There is a new
	field in the m_hdr struct in mbufs (pointer) that points to the
	mbuf's corresponding page descriptor structure, so that pointer is
	aquired and the free mbuf chain is extracted from the structure to
	which the freed mbuf is attached (as it always was). I guess the only
	real addition in CPU cycles here is the following: a simple check was
	added that just checks if the entry is on the "empty" list and if it
	is, moves it over to the "free list." If that's not the case, then
	there is a possibility that the freed mbuf completes a page and the
	page can be freed, so if that's the case and min_on_avail allows it,
	then the page is freed back to the map (notice that this behavior is
	tunable - again - with min_on_avail).

>    I'm not trying to 'frown upon evolution', unless the particular form of
> evolution is to make the software worse than it was. I *can* be convinced
> that your proposed changes are a good thing and I'm asking you to step up
> to the plate and prove it.

	That sounds fair.

> 
> -DG
> 
> David Greenman
> Co-founder, The FreeBSD Project - http://www.freebsd.org
> Manufacturer of high-performance Internet servers - http://www.terasolutions.com
> Pave the road of life with opportunities.


--
 Bosko Milekic  *  Voice/Mobile: 514.865.7738  *  Pager: 514.921.0237
    bmilekic@technokratis.com  *  http://www.technokratis.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message