From owner-freebsd-hackers Thu Jul 6 7:56:53 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 2F10237BA35 for ; Thu, 6 Jul 2000 07:56:46 -0700 (PDT) (envelope-from bmilekic@dsuper.net) Received: from modemcable009.62-201-24.mtl.mc.videotron.net ([24.201.62.9]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0FXA00H1J2CI1Y@falla.videotron.net> for freebsd-hackers@FreeBSD.ORG; Thu, 6 Jul 2000 09:19:31 -0400 (EDT) Date: Thu, 06 Jul 2000 09:21:38 -0400 (EDT) From: Bosko Milekic Subject: Re: mbuf re-write(s), v 0.1 In-reply-to: <8459.962654377@critter.freebsd.dk> X-Sender: bmilekic@jehovah.technokratis.com To: Poul-Henning Kamp Cc: freebsd-hackers@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 3 Jul 2000, Poul-Henning Kamp wrote: > In message <20000703154040.V18942@jade.chc-chimes.com>, Bill Fumerola writes: > > >I'd love to have FreeBSD be able to reclaim memory quicker at the sacrifice > >of a few cpu cycles. Why? Well, the "add more memory" arguement doesn't work > >well when I get DoS attacks that will eat any memory available because they > >can connect quicker then I can reclaim the memory. > > I have this dream of a global "VM availability flag": > > Imagine if the kernel kept a global variable: > > enum {VM_PLENTY, VM_TIGHT, VM_NONE, VM_PANIC} vm_state; > /* VM_PLENTY: No worries */ > /* VM_TIGHT: Don't make it any worse if you can avoid it */ > /* VM_NONE: Fail if you must, free some if you can */ > /* VM_PANIC: "VM, VM, my panic for some VM" */ > > At least a few pieces of our memory-gobbling code could examine > and adjust their caching behaviour from that. Take the vfs > name-cache as an example: > > /* Create a new vfs_name-cache entry */ > cache_enter(...) > switch (vm_state) { > case VM_PLENTY: > /* do as today */ > break; > case VM_TIGHT: > /* delete at least as many bytes as we add (LRU wise) */ > break; > case VM_NONE: > /* delete two entries, don't add the new one */ > break; > case VM_PANIC: > /* delete the entire cache */ > break; > } > > The mbuf allocator can use this to great effect if the various > MGET() calls were labeled according to their importance. > > Respecting such a flag in the various kernels provide great resistance > against DoS. > > User land processes can benefit from this as well: a sysctl would > allow malloc(3) to investigate this state whenever it had was > dealing with a full page, and if needed it could release all it's > cached pages, possibly even call an optional "GC" callback into > the program to force a realloc(3) sequence in long-running daemons. > (An alternative scenario is to have a SIGVMSTATE defaulting to > ignore which gets sent when the variable changes, but that would > have thundering herd issues if a lot of processes was paged out.) > > If only somebody would add that variable, I don't feel like diving > into the VM system right now. > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD coreteam member | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. I've recently had the chance to get some profiling done. I used metrics obtained from gprof, as well as the (basic block length) * (number of executions) metric generated by kernbb. The latter reveals an approximate 30% increase in the new code, but does not necessarily imply that time of execution is increased by that amount. gprof makes a fair estimate on execution time, and reveals that the new code is, worse case scenario 30% slower, and best case scenario, negligeably slower. Of course, I'm leaving out some details here, because I've decided to change things a little, in order to further improve (and significantly, at that) the performance of the new code. Note however that the 30% overall APPROXIMATE increase is not something I would consider significant, especially since the allocator/free routines don't hold much %time, and are not the bottleneck in any of the call graphs. I did decide to make drastic changes, however, in order to maintain with the 0-tolerance policy, even if it involves somewhat getting rid of a cleaner interface and adopting a "kernel process." See below. Following your suggestion for vm_state, which I am not about to implement at this time until I finish this work (so if somebody else wants to take it up, go ahead; just make sure to let all of us know, in case that we decide to do it later). However, the planned changes I am going to make this upcoming weekend to the mbuf code will be aimed at fitting in nicely with the vm_state stuff. Please note, however, that I am not going to dip into net code (yet) and begin inserting "good measure" code that will prevent new PCB allocations depending on vm_state, this is however extremely useful and should be considered in the near future. What I'm planning to do, on the other hand, is have the free routine never explicitly drain pages back to the map. What this means is that there will be a kernel process which can be awoken optionally when number_of_allocated_pages is by far exceeding average_allocated_pages (or, in the future, when vm_state is in a meaningful state). That process will be responsible for walking the "free list" and draining all pages associated with "complete" page descriptor structures until hitting average_allocated_pages once again. Thus, freeing back to the map will never be performed by the m_free or MFREE stuff, in which case performance will be improved. I am tentatively considering this, mainly because of what you mentionned about potential future for vm_state. The other option is of course to have MFREE never free back to the map if (how) == M_DONTWAIT, which will virtually maintain great performance during interrupts. I've actually already done this, but it's difficult to see the nice effects of it as I cannot profile MFREE (since it's a macro) but only m_free, which is typically called at M_WAIT, in which case I don't note the improvement. It's up to you guys. I would like suggestions. Furthermore, if whoever out there has a decent -CURRENT machine that's under heavy network load and would like to help me performance test/tune, please contact me! (e.g. Yahoo! guys, maybe?) Cheers, Bosko. -- Bosko Milekic * Voice/Mobile: 514.865.7738 * Pager: 514.921.0237 bmilekic@technokratis.com * http://www.technokratis.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message