From owner-freebsd-current@FreeBSD.ORG Mon Jul 21 06:19:22 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6731437B401; Mon, 21 Jul 2003 06:19:22 -0700 (PDT) Received: from godel.mtl.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id A7B8C43F93; Mon, 21 Jul 2003 06:19:21 -0700 (PDT) (envelope-from bmilekic@technokratis.com) Received: from godel.mtl.distributel.net (localhost [127.0.0.1]) h6L9NREH045617; Mon, 21 Jul 2003 09:23:27 GMT (envelope-from bmilekic@technokratis.com) Received: (from bmilekic@localhost) by godel.mtl.distributel.net (8.12.9/8.12.9/Submit) id h6L9NRm8045616; Mon, 21 Jul 2003 09:23:27 GMT X-Authentication-Warning: godel.mtl.distributel.net: bmilekic set sender to bmilekic@technokratis.com using -f Date: Mon, 21 Jul 2003 09:23:27 +0000 From: Bosko Milekic To: harti@freebsd.org Message-ID: <20030721092327.GA45509@technokratis.com> References: <20030718185122.N14232@beagle.fokus.fraunhofer.de> <20030718132503.GB29449@technokratis.com> <3F198E7E.4040309@fokus.fhg.de> <20030719170355.GA37444@technokratis.com> <20030721085426.A48033@beagle.fokus.fraunhofer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030721085426.A48033@beagle.fokus.fraunhofer.de> User-Agent: Mutt/1.4.1i cc: current@freebsd.org Subject: Re: SMP problem with uma_zalloc X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2003 13:19:22 -0000 On Mon, Jul 21, 2003 at 09:03:00AM +0200, Harti Brandt wrote: > On Sat, 19 Jul 2003, Bosko Milekic wrote: > > BM> > BM>On Sat, Jul 19, 2003 at 08:31:26PM +0200, Lara & Harti Brandt wrote: > BM>[...] > BM>> Well the problem is, that nothing is starved. I have an idle machine and > BM>> a zone that I have limited to 60 or so items. When allocating the 2nd > BM>> item I get block on the zone limit. Usually I get unblocked whenever I > BM>> free an item. This will however not happen, because I have neither > BM>> reached the limit nor is there memory pressure in the system to which I > BM>> could react. I simply may be blocked forever. > BM> > BM> UMA_ZFLAG_FULL is set on the zone prior to the msleep(). This means > BM> that the next free will result in your wakeup, as the next free will > BM> be sent to the zone internally, and not the pcpu cache. > > But there is no free to come. To explain where we have the problem: > > the HARP ATM code uses a zone in the IP code to allocate control blocks > for VCCs. The zone is limited to 100 items which evaluates to 1 page. > When I start an interface, first the signalling vcc=5 is opened. This > allocates one item from the zone, all the other items go into the CPU > cache. Next I start ILMI. ILMI tries to open its vcc=16. While this works > on UP machines (the zone allocator will find a free item in the CPU > cache), on my 2-proc machine half of the time ILMI gets blocked on the > zonelimit. And it blocks there forever, because, of course nobody is going > to free the one and only allocated item. On a four processor machine the > blocking probability will be 75%. > > So in order to be able to get out N items from a zone (given that there is > no shortage of memory) one has to set the limit to N + nproc * > items_per_allocation, which one cannot do because he doesn't know > items_per_allocation. It sounds to me like your example is really not the general-case one. Basically, you're using a zone capped off at 1 page. Currently in UMA, this is the size of the slab. So, basically, you have this whole zone (with all associated overhead) so as to serve a maximum of only one slab. This defeats most of the assumptions made when the zone is created with PCPU caches. The zone maximum exists to prevent more than the specified amount of resources to be allocated toward the given zone; I don't think that the intention was "to ensure that if the maximum items aren't allocated, there will always be one available," despite the fact that that is the effective behavior on UP. The solution to your really small zone problem is to either make the zone bigger, or to hack at UMA to export the UMA_ZONE_INTERNAL API properly so that you can skip the pcpu caches for all allocations and go straight to the zone. I'd suggest that you make the zone bigger, unless there's a Really Good reason not to. In mb_alloc (for mbufs) I had implemented something that in this sort of scenario would dip into the other caches and transfer over what I called a "bucket" to the current cpu cache. Although in this scenario, it seems like that sort of solution would do what you want, some more thought into its behavior reveals that in fact it pessimizes the situation. To give you a better idea, let's consider what happens in this specific scenario, where a "bucket" would be all of a page. The allocator would make an attempt to allocate from its pcpu cache but would find it empty, so it would then attempt to steal a bucket from the second cpu's cache. There, it would find the bucket, move it to its cpu's cache, and grab an item from it. However, a thread on the second cpu may then attempt to grab an item, and the bucket will just ping-pong from pcpu cache to pcpu cache; the problem that the allocator was trying to solve for such really small zones was in fact still there - because of the general assumptions made in the design with respect to the size of most zones that it dealt with - only instead of failing the allocation, it was pessimizing it. > harti Regards, -- Bosko Milekic * bmilekic@technokratis.com * bmilekic@FreeBSD.org TECHNOkRATIS Consulting Services * http://www.technokratis.com/