From owner-freebsd-hackers  Thu Jul  6  7:56:53 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
	by hub.freebsd.org (Postfix) with ESMTP id 2F10237BA35
	for <freebsd-hackers@FreeBSD.ORG>; Thu,  6 Jul 2000 07:56:46 -0700 (PDT)
	(envelope-from bmilekic@dsuper.net)
Received: from modemcable009.62-201-24.mtl.mc.videotron.net ([24.201.62.9])
 by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8)
 with ESMTP id <0FXA00H1J2CI1Y@falla.videotron.net> for freebsd-hackers@FreeBSD.ORG; Thu,
 6 Jul 2000 09:19:31 -0400 (EDT)
Date: Thu, 06 Jul 2000 09:21:38 -0400 (EDT)
From: Bosko Milekic <bmilekic@dsuper.net>
Subject: Re: mbuf re-write(s), v 0.1
In-reply-to: <8459.962654377@critter.freebsd.dk>
X-Sender: bmilekic@jehovah.technokratis.com
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: freebsd-hackers@FreeBSD.ORG
Message-id: <Pine.BSF.4.21.0007060903010.2989-100000@jehovah.technokratis.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Mon, 3 Jul 2000, Poul-Henning Kamp wrote:

> In message <20000703154040.V18942@jade.chc-chimes.com>, Bill Fumerola writes:
> 
> >I'd love to have FreeBSD be able to reclaim memory quicker at the sacrifice
> >of a few cpu cycles. Why? Well, the "add more memory" arguement doesn't work
> >well when I get DoS attacks that will eat any memory available because they
> >can connect quicker then I can reclaim the memory.
> 
> I have this dream of a global "VM availability flag":
> 
> Imagine if the kernel kept a global variable:
> 
> 	enum {VM_PLENTY, VM_TIGHT, VM_NONE, VM_PANIC} vm_state;
> 	/* VM_PLENTY: No worries */
> 	/* VM_TIGHT: Don't make it any worse if you can avoid it */
> 	/* VM_NONE: Fail if you must, free some if you can */
> 	/* VM_PANIC: "VM, VM, my panic for some VM" */
> 
> At least a few pieces of our memory-gobbling code could examine
> and adjust their caching behaviour from that.  Take the vfs
> name-cache as an example:
> 
> 	/* Create a new vfs_name-cache entry */
> 	cache_enter(...)
> 		switch (vm_state) {
> 		case VM_PLENTY:
> 			/* do as today */
> 			break;
> 		case VM_TIGHT:
> 			/* delete at least as many bytes as we add (LRU wise) */
> 			break;
> 		case VM_NONE:
> 			/* delete two entries, don't add the new one */
> 			break;
> 		case VM_PANIC:
> 			/* delete the entire cache */
> 			break;
> 		}
> 
> The mbuf allocator can use this to great effect if the various
> MGET() calls were labeled according to their importance.
> 
> Respecting such a flag in the various kernels provide great resistance
> against DoS.
> 
> User land processes can benefit from this as well: a sysctl would
> allow malloc(3) to investigate this state whenever it had was
> dealing with a full page, and if needed it could release all it's
> cached pages, possibly even call an optional "GC" callback into
> the program to force a realloc(3) sequence in long-running daemons.
> (An alternative scenario is to have a SIGVMSTATE defaulting to
> ignore which gets sent when the variable changes, but that would
> have thundering herd issues if a lot of processes was paged out.)
> 
> If only somebody would add that variable, I don't feel like diving
> into the VM system right now.
> 
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD coreteam member | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.

	I've recently had the chance to get some profiling done.

	I used metrics obtained from gprof, as well as the (basic block
  length) * (number of executions) metric generated by kernbb. The latter
  reveals an approximate 30% increase in the new code, but does not
  necessarily imply that time of execution is increased by that amount.
  	gprof makes a fair estimate on execution time, and reveals that the
  new code is, worse case scenario 30% slower, and best case scenario,
  negligeably slower. Of course, I'm leaving out some details here, because
  I've decided to change things a little, in order to further improve (and
  significantly, at that) the performance of the new code. Note however
  that the 30% overall APPROXIMATE increase is not something I would
  consider significant, especially since the allocator/free routines don't
  hold much %time, and are not the bottleneck in any of the call graphs. I
  did decide to make drastic changes, however, in order to maintain with
  the 0-tolerance policy, even if it involves somewhat getting rid of a
  cleaner interface and adopting a "kernel process." See below.

  Following your suggestion for vm_state, which I am not about to implement
  at this time until I finish this work (so if somebody else wants to take
  it up, go ahead; just make sure to let all of us know, in case that we
  decide to do it later). However, the planned changes I am going to make
  this upcoming weekend to the mbuf code will be aimed at fitting in nicely
  with the vm_state stuff. Please note, however, that I am not going to dip
  into net code (yet) and begin inserting "good measure" code that will
  prevent new PCB allocations depending on vm_state, this is however
  extremely useful and should be considered in the near future. What I'm
  planning to do, on the other hand, is have the free routine never
  explicitly drain pages back to the map. What this means is that there
  will be a kernel process which can be awoken optionally when
  number_of_allocated_pages is by far exceeding average_allocated_pages
  (or, in the future, when vm_state is in a meaningful state). That process
  will be responsible for walking the "free list" and draining all pages
  associated with "complete" page descriptor structures until hitting
  average_allocated_pages once again. Thus, freeing back to the map will
  never be performed by the m_free or MFREE stuff, in which case
  performance will be improved.

  I am tentatively considering this, mainly because of what you mentionned
  about potential future for vm_state. The other option is of course to
  have MFREE never free back to the map if (how) == M_DONTWAIT, which
  will virtually maintain great performance during interrupts. I've
  actually already done this, but it's difficult to see the nice effects of
  it as I cannot profile MFREE (since it's a macro) but only m_free, which
  is typically called at M_WAIT, in which case I don't note the
  improvement.

  It's up to you guys. I would like suggestions.

  Furthermore, if whoever out there has a decent -CURRENT machine that's
  under heavy network load and would like to help me performance test/tune,
  please contact me! (e.g. Yahoo! guys, maybe?)

  Cheers,
  Bosko.

--
 Bosko Milekic  *  Voice/Mobile: 514.865.7738  *  Pager: 514.921.0237
    bmilekic@technokratis.com  *  http://www.technokratis.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message