From owner-freebsd-current@FreeBSD.ORG Thu Dec 23 10:30:06 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F6A416A4CE for ; Thu, 23 Dec 2004 10:30:06 +0000 (GMT) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.FreeBSD.org (Postfix) with SMTP id 4CC5743D46 for ; Thu, 23 Dec 2004 10:30:05 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 27657 invoked from network); 23 Dec 2004 10:17:29 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 23 Dec 2004 10:17:29 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.1/8.13.1) with ESMTP id iBNAHS1U035023; Thu, 23 Dec 2004 11:17:28 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.1/8.13.1/Submit) id iBNAHSGc035022; Thu, 23 Dec 2004 11:17:28 +0100 (CET) (envelope-from pho) Date: Thu, 23 Dec 2004 11:17:27 +0100 From: Peter Holm To: Bosko Milekic Message-ID: <20041223101727.GA34943@peter.osted.lan> References: <20041209144233.GA46928@peter.osted.lan> <20041220234103.GA59225@technokratis.com> <20041222210553.GA28108@peter.osted.lan> <20041222221540.GA70052@technokratis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041222221540.GA70052@technokratis.com> User-Agent: Mutt/1.4.2.1i cc: current@freebsd.org Subject: Re: panic: uma_zone_slab is looping X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Dec 2004 10:30:06 -0000 On Wed, Dec 22, 2004 at 05:15:40PM -0500, Bosko Milekic wrote: > > On Wed, Dec 22, 2004 at 10:05:53PM +0100, Peter Holm wrote: > > On Mon, Dec 20, 2004 at 06:41:04PM -0500, Bosko Milekic wrote: > > > > > > I realize it's been a while. > > > > > > Anyway, what I *think* is going on here is that slab_zalloc() is > > > actually returning NULL even when called with M_WAITOK. Further > > > inspection in slab_zalloc() reveals that this could come from several > > > places. One of them is kmem_malloc() itself, which I doubt will ever > > > return NULL if called with M_WAITOK. If this assumption is indeed > > > correct, then the NULL must be being returned by slab_zalloc() itself, > > > or due to a failed uma_zalloc_internal() call. It is also possible > > > that slab_zalloc() returns NULL if the init that gets called for the > > > zone fails. However, judging from the stack trace you provided, the > > > init in question is mb_init_pack() (kern_mbuf.c). This particular > > > init DOES perform an allocation and CAN in theory fail, but I believe > > > it should be called with M_WAITOK as well, and so it should also never > > > fail in theory. > > > > > > Have you gotten any further with the analysis of this particular > > > trace? If not, I would suggest adding some more printf()s and > > > analysis into slab_zalloc() itself, to see if that is indeed what is > > > causing the infinite looping in uma_zone_slab() and, if so, attempt to > > > figure out what part of slab_zalloc() is returning the NULL. > > > > OK, did that: http://www.holm.cc/stress/log/freeze03.html > > OK, well, I think I know what's happening. See if you can confirm > this with me. > > I'll start with your trace and describe the analysis, please bear with > me because it's long and painful. > > Your trace indicates that the NULL allocation failure, despite a call > with M_WAITOK, is coming from slab_zalloc(). The particular thing > that should also be mentionned about this trace, and your previous > one, is that they both show a call path that goes through an init > which performs an allocation, also with M_WAITOK. Currently, only the > "packet zone" does this. It looks something like this: > > 1. UMA allocation is performed for a "packet." A "packet" is an mbuf > with a pre-attached cluster. > > 2. UMA dips into the packet zone and finds it empty. Additionally, it > determines that it is unable to get a bucket to fill up the zone > (presumably there is a lot of memory request load). So it calls > uma_zalloc_internal on the packet zone (frame 18). > > 3. Perhaps after some blocking, a slab is obtained from the packet > zone's backing keg (which coincidentally is the same keg as the > mbuf zone's backing keg -- let's call it the MBUF KEG). So now > that an mbuf item is taken from the freshly allocated slab obtained > from the MBUF KEG, uma_zalloc_internal() needs to init and ctor it, > since it is about to return it to the top (calling) layer. It > calls the initializer on it for the packet zone, mbuf_init_pack(). > This corresponds to frame 17. > > 4. The packet zone's initializer needs to call into UMA again to get > and attach an mbuf cluster to the mbuf being allocated, because mbufs > residing within the packet zone (or obtained from the packet zone) > MUST have clusters attached to them. It attempts to perform this > allocation with M_WAITOK, because that's what the initial caller > was calling with. This is frame 16. > > 5. Now the cluster zone is also completely empty and we can't get a > bucket (surprise, surprise, the system is under high memory-request > load). UMA calls uma_zalloc_internal() on the cluster zone as well. > This is frame 15. > > 6. uma_zalloc_internal() calls uma_zone_slab(). Its job is to find a > slab from the cluster zone's backing keg (a separate CLUSTER KEG) > and return it. Unfortunately, memory-request load is high, so it's > going to have a difficult time. The uma_zone_slab() call is frame > 14. > > 7. uma_zone_slab() can't find a locally cached slab (hardly > surprising, due to load) and calls slab_zalloc() to actually go to > VM and get one. Before calling, it increments a special "recurse" > flag so that we do not recurse on calling into the VM. This is > because the VM itself might call back into UMA when it attempts to > allocate vm_map_entries which could cause it to recurse on > allocating buckets. This recurse flag is PER zone, and really only > exists to protect the bucket zone. Crazy, crazy shit indeed. > Pardon the language. This is frame 13. > > 8. Now slab_zalloc(), called for the CLUSTER zone, determines that the > cluster zone (for space efficiency reasons) is in fact an OFFPAGE > zone, so it needs to grab a slab header structure from a separate > UMA "slab header" zone. It calls uma_zalloc_internal() from > slab_zalloc(), but it calls it on the SLAB HEADER zone. It passes > M_WAITOK down to it, but for some reason IT returns NULL and the > failure is propagated back up which causes the uma_zone_slab() to > keep looping. Go back to step 7. > > This is the infinite loop 7 -> 8 -> 7 -> 8 -> ... which you seem to > have caught. > > The question now is why does the uma_zalloc_internal() fail on the > SLAB HEADER zone, even though it is called with M_WAITOK. > Unfortunately, your stack trace does not provide enough depth to be > able to continue an accurate deductive analysis from this point on > (you would need to sprinkle MORE KASSERTs). > > However, here are some hypotheses. > > The uma_zalloc_internal() which ends up getting called also ends up > calling uma_zone_slab(), but uma_zone_slab() eventually fails (this is > a fact, this is the only reason that the uma_zalloc_internal() could > in turn fail for the SLAB HEADER zone, which doesn't have an init or a > ctor). > > So why does the uma_zone_slab() fail with M_WAITOK on the slab header > zone? Possibilities: > > 1. The recurse flag is at some point determined non-zero FOR THE SLAB > HEADER backing keg. If the VM ends up getting called from the > subsequent slab_zalloc() and ends up calling back into UMA for > whatever allocations, and "whatever allocations" are also > potentially offpage, and a slab header is ALSO required, then we > could also be recursing on the slab header zone from VM, so this > could cause the failure. > > if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) { > /* ADD PRINTF HERE */ > printf("This zone: %s, forced fail due to recurse non-null", > zone->uz_name); > return NULL; > } > The printf didn't really fly. It seems to be called early in boot: ck_flags(0,0,c07e8890,882) at _mtx_lock_flags+0x24 uma_zalloc_internal(c09284a0,c0c20c84,2) at uma_zalloc_internal+0x2d uma_zcreate(c07e8b1e,40,0,0,0,0,3,2000) at uma_zcreate+0x57 uma_startup(c103d000,c103d000,28000,c0c20d78,ff00000) at uma_startup+0x2ae vm_page_startup(c1065000,c0c20d98,c05ec857,0,c08525d0) at vm_page_startup+0x109 vm_mem_init(0,c08525d0,c1ec00,c1e000,c28000) at vm_mem_init+0x13 mi_startup() at mi_startup+0xb3 begin() at begin+0x2c so I just sprinkled some more asserts. I'm trying to see if I can provoke this problem more consistently, based on your analysis. It usually takes me a day or two of testing to get there. > If you get the print to trigger right before the panic (last one > before the panic), see if it is on the SLAB HEADER zone. In > theory, it should only happen for the BUCKET ZONE. > > 2. M_WAITOK really isn't set. Unlikely. > > If (1) is really happening, we'll need to think about it a little more > before deciding how to fix it. As you can see, due to the recursive > nature of UMA/VM, things can get really tough when resources are > scarce. > > Regards, > -- > Bosko Milekic > bmilekic@technokratis.com > bmilekic@FreeBSD.org -- Peter Holm