From owner-freebsd-net Tue Oct 31 9:45:19 2000 Delivered-To: freebsd-net@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id A15D537B479 for ; Tue, 31 Oct 2000 09:45:14 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G3B00BDP29Z16@falla.videotron.net> for freebsd-net@FreeBSD.ORG; Tue, 31 Oct 2000 12:37:11 -0500 (EST) Date: Tue, 31 Oct 2000 12:41:46 -0500 (EST) From: Bosko Milekic Subject: Re: MP: per-CPU mbuf allocation lists In-reply-to: <20001031002953.O22110@fw.wintelcom.net> X-Sender: bmilekic@jehovah.technokratis.com To: Alfred Perlstein Cc: freebsd-net@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=X-UNKNOWN Content-transfer-encoding: QUOTED-PRINTABLE Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 31 Oct 2000, Alfred Perlstein wrote: > Actually this might work. :) >=20 > Consider, you're at 0 buffers on your freelist, you allocate a > chunk from the general pool if it's bigger than the tunable 'lw' > the whole chain goes onto your 'lw' list. >=20 > As you allocate and free from lw, it will 'spill' onto your 'hw' > list because that's where you free to when 'lw' is full. =20 >=20 > If you happen to grab something off the freelist smaller than 'lw' > you just live with that, or possibly grab another chunk, either > way this algorithm heals the chunks by spilling free's from lw to > hw to make the cutoff. >=20 > Eventually you'll overflow your hw list and free the correct sized > chunk onto the freelist (hw - lw buffers). >=20 > If you're just consuming mbufs well then someone else will be doing > the above described free'ing balancing out the main pool. >=20 > Remeber, accuracy with balancing the chunks isn't nearly as important > as speed. =09I never said that it wouldn't work. =09I just said that it would if anything, add way too many hysterics and also make certain situations MUCH slower (like the situation when this sort of move from the global list to the fast list needs to be done). The purpose of doing this sort of thing, as you suggest it, would be to supposedly make future allocations from the same CPU faster (because they won't have to go dip into the global pool until the next time you run out on your fast list). Well, I'm telling you that it's really not worth the trouble. =09First of all, if it were to be done, you'd have to scratch my idea of how to wakeup sleepers, because otherwise, you would end up having some freeing of INDIVIDUAL (not belonging to any "chunk") mbufs onto mmbfree, which would further fragment it. The alternative solution would be to make yet another separate list to be used solely by the waiting routines, which would involve yet another mutex, etc, etc, etc. =09Second of all, dealing with various sized chunks is enough trouble as it is. It sounds very nice in theory, but in practise, it's not. It would mean that either you keep track of the last mbuf on your F2 fast list, which would involve, everytime you free to F2: =09* you're freeing to the head of the list, so you have to make your =09top mbuf (the one you're adding) have nextpkt point to the last mbuf, =09which you'll probably get from the previously first mbuf on F2, after =09which point you'll set that previously-first mbuf's nextpkt to NULL =09in order to avoid confusion on the mmbfree list. or that you "find" your last mbuf when it comes time to create the chunk on mmbfree by walking the mbufs that are part of the "chunk." =09Third of all, consider the mayhem when the low watermark is changed several times; the mmbfree list gets so damn fragmented that it becomes complete HELL to combine/split chunks. Speaking of which, splitting chunks, if they are unreasonably large, is very costly, because it would mean walking the present "chunk" list until you get to the middle. =09Fourth=B8 if your mmbfree general list is empty, you have to allocate = a page and split it into PAGE_SIZE / MSIZE mbufs, which may or may not be larger than the chunk you happen to be looking for at the moment. Say it's 50% larger (which is unlikely, but possible), you'll be walking the newly allocated mbufs splitting them into two chunks. Say it's smaller (very likely), you'll either be trying to allocate more pages (more overhead) -- more overhead -- or dealing with the fact that you couldn't grab as many mbufs as you wanted, and re-arranging chunks. =09Finally, think of the MAYHEM involved in possibly invalidating nextpkt pointers when changing chunk sizes. =09On the other hand, simply dropping this chunk idea will likely have you grabbing the mmbfree mutex quite often when you=B4ve got nothing on your fast list(s), but that's only until mbufs are again freed back to your fast lists... In other words, just because you have 0 on your fast list AT THIS VERY INSTANT, doesn't mean that you'll still have 0 in THIS VERY INSTANT + some DELTA_TIME. =09I don't think that anybody is prepared to introduce the hysterics and complexity of this "chunk" system in return for per CPU lists, when just eliminating the "chunk" idea would leave you with a system at WORST 5-10% less performant, but at least 80% less complex. > --=20 > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk." =09I really liked ALL of the suggestions you made but this one, for more than obvious reasons. If you agree, I can go about implementing the revised design as I had presented it in my last Email, which would leave space for adding this "chunk" idea if for whatever reason you want to see for yourself. Then either you or me or someone else can eventually implement it and profile it before introducing it. Agreed? Cheers, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message