From owner-freebsd-hackers@freebsd.org Mon Nov 28 23:19:30 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 033BCC5AF2A for ; Mon, 28 Nov 2016 23:19:30 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x244.google.com (mail-io0-x244.google.com [IPv6:2607:f8b0:4001:c06::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C4F55105D for ; Mon, 28 Nov 2016 23:19:29 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x244.google.com with SMTP id j92so25976697ioi.0 for ; Mon, 28 Nov 2016 15:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=w4TAhT9q+8eXu1aUq8K3LuX2mjS8/nZ6eyORi74x5Zo=; b=HB8PGboX/1dN+rUQUYnlrImnYEE3TgoS3HpNAd/3ATleRereNbBXpz6xs2RrDvB09f Ogmi+CStV8RpIh4xTiWYjBUmetlqg6Cn8BCvqo1cF9IltvhH3Lt0sF7qhvXvYrJMG3wv X9sHWSVGx7Sj5f3eHqYin1+2LX1KIlu/LChZywEvtK2fysAA4dOGOc8Lso0JasGz8F6x gCtoRQSfl0RS1NC5e7+4fAzARKvRsMCSk5EPa0AYLwa7j0qeGDCIs/GBCcfaHdotzWJs bHBEG4d9KQIQbojVqFCN0lHkmra8F/B4W98nxE7PoNxQJkFVST7c9ULvL7FBmdGQUlbY HAUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=w4TAhT9q+8eXu1aUq8K3LuX2mjS8/nZ6eyORi74x5Zo=; b=Ftx6h0f6nSzKtc03zINkq8SFqeCbS/9dopLYOU19nXAvVinv+YNNi3el8Jm17QElqL /A8nLJYRcgHJEpUJsQNZJ232Drk/hHcGpxz7MG9CLPCe/2BP15vBxgb7U3+0YZomZlxy kb9WZAVlrtZMjedH43HuUqKLl31EwYorQiaWGzXPQyJ00OQgflWUal/G4GnrDgPwc3BO uj6y0r82S7JXV8lSwaidGkass12vZbe0Gvs9N+GLLZ10788SjUSyYsgISF95FqG0slA0 AoZZl08I07sH4fo7WFX/mXGbdTm2xt6dDG3o+f0JGF8P+QkXi13AfscPu3q2ExNmzo+G zTxQ== X-Gm-Message-State: AKaTC02LCkS3YoxigG/RTAJLqcB3YLmpwTTvAYvwuKOJ1oEg3PNJWey5AvfPCPELPljvGVCwFS/YgLoRALlCoA== X-Received: by 10.107.154.14 with SMTP id c14mr8479306ioe.0.1480375169008; Mon, 28 Nov 2016 15:19:29 -0800 (PST) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.68.132 with HTTP; Mon, 28 Nov 2016 15:19:28 -0800 (PST) X-Originating-IP: [50.253.99.174] In-Reply-To: References: <20161128041847.GA65249@charmander> <20161128120046.GP54029@kib.kiev.ua> <20161128144135.10f93205@fabiankeil.de> <20161128160311.GQ54029@kib.kiev.ua> <20161128162240.GM99742@zxy.spb.ru> From: Warner Losh Date: Mon, 28 Nov 2016 16:19:28 -0700 X-Google-Sender-Auth: RPYQSnYQ7JXL5xuuyWlE9HR-YYk Message-ID: Subject: Re: FreeBSD 11 i386 disk deadlock (I think) (now with reproduction steps!) To: David Cross Cc: Slawa Olhovchenkov , Konstantin Belousov , "freebsd-hackers@freebsd.org" , Fabian Keil Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2016 23:19:30 -0000 On Mon, Nov 28, 2016 at 10:50 AM, David Cross wrote: > I wouldn't call this a 'workaround', but the right answer. Something in > the disk io path shouldn't be allocating memory out of the pool that can > cause paging (since any of that could be IN the path for paging). It was > what I assumed Fabian's proposed patch was. > > From looking at the process list on my machine, it seems that geli > allocates a process per core per provider, is there a reason to not have > each of these on startup allocate themselves a single buffer of > sector-size, and just put all operations through that? You're not > (realistically) going to get more concurrency than that. I guess another > approach would be to pre-allocate a ring buffer of the desired operational > depth.. but that seems overkill. I have some code that helps fix this in the GEOM layer. For the swapper, it will allocate out of a pool of memory that's set aside for that. While it is still a pool, the only time things are allocated out of it is when the swapper is swapping stuff out. So if you hit a resource shortage and have to wait, you know the wait will be bounded unless the disk I/O never completes. This is already weakly done with UMA, but the guarantees aren't strong enough that we'll always make progress. There are other places in the stack that allocate shared resources, but this one bit us at Netflix. I've not yet cleaned up the patches for upstreaming... I want to let the recent vm changes settle before tackling this again as well... Warner > On Mon, Nov 28, 2016 at 11:22 AM, Slawa Olhovchenkov wrote: > >> On Mon, Nov 28, 2016 at 06:03:11PM +0200, Konstantin Belousov wrote: >> >> > On Mon, Nov 28, 2016 at 02:43:30PM +0100, Fabian Keil wrote: >> > > David Cross wrote: >> > > >> > > > This is certainly new behavior, or a new manifestation. >> > > >> > > Recently a couple of uma consumers were changed to share uma zones >> > > instead of using a dedicated zone. As a result geli competes with >> > > more uma consumers and is more likely to deadlock. The bug isn't >> > > new, it's just triggered more often now. >> > The problem happens on layer much lower than UMA, it is whole reusable >> > page pool which is depleted and cannot be re-filled without allocating >> > more memory. If you think about it, the deadlock is obviously trivial: >> > pagedaemon is the main source of the free pages, but if producing free >> > page requires allocating one, low memory condition is equal to deadlock. >> > >> > It was always there, in the sense that for all versions of freebsd, if >> > file/disk write path requires memory allocation, there is the trouble. >> > >> > For geom, some special unique measures were taken so that bio allocations >> > do not cause the issue in typical situations. >> >> Typical workaround for this is pre-allocate some memory for this >> operation. >> > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"