From owner-freebsd-hackers@freebsd.org Tue Nov 29 15:33:26 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 976AFC5CB91 for ; Tue, 29 Nov 2016 15:33:26 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 60E661627 for ; Tue, 29 Nov 2016 15:33:26 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-oi0-x229.google.com with SMTP id w63so194083999oiw.0 for ; Tue, 29 Nov 2016 07:33:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=CJFkkX53H2QDE/9lcvuHhzr8vrS1+YKme/vVdZX8xXQ=; b=knkNmg6SgaJXQ0am0KVmuegV7xG70d0Bmug7TfG9TFtvAneT1cqR+jQGc1el+AxzLp RR2Q4joDb/+lDqUSbDcQtFkdmxdiKOLyMrsGvu9P587Wh+U1M2EuXKL9PkDOmIh0z8uf YMqSUhGX5W3Urz+iszbPHpPpCNKnvDPw8qedrsfmvxQPFC/m05XZ0AffXU7K2AOLDXa7 Gb/gKCL2IIwncSnFOOeboqwMyZpYQi5kGKtF+Wv0zCywF/0Sx7Wv689h7SHBDmc5dUWZ /GppC7ZvCVH749w9GvTJ3Y925y0RTQ5BA5loKTnko4W4RYHxu5fOopy9QQa1SG3roo1J Xqlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=CJFkkX53H2QDE/9lcvuHhzr8vrS1+YKme/vVdZX8xXQ=; b=mNeCXWajhFN6jNg56JGqnoMvYx9YqHEcjWyyo+zXc1u5QhoMOEAertYRxRIZMNehi6 hc1QqWQyuh6yqKOEKpICxWRMBxuYLN7jK0VK1W6CJibxdEtJPSNlRKHfKX93o+xQ8umL S4Gax4lWACNT/UvO4dZIO1RIy/Ku+WCr6ouxSslJsoQcTRlyL2WXWbL7/a2ITInqd6uv SplI1+5Q8yHT/X72O/5nmJ6vwQDFCAUAZEgmScBGbxL2uv8/ijfHbPTw8AW85pS9/s1a S5iaXgLji2/HHH/bFFh4J2X5c6Nbo3bkBstB0LtKIN5SpOzr5xhelo3u9JuLKFSDiTS2 cKXQ== X-Gm-Message-State: AKaTC00qwDeyug5lzQjbQFfipRuFn6uppfuaUsI+ezi1vfoHJddHqmVqmDrhV8VjT1b5AvKmxa4we7tsb7TK4w== X-Received: by 10.36.41.81 with SMTP id p78mr24767928itp.60.1480433604946; Tue, 29 Nov 2016 07:33:24 -0800 (PST) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.31.199 with HTTP; Tue, 29 Nov 2016 07:33:24 -0800 (PST) X-Originating-IP: [50.253.99.174] In-Reply-To: <20161129131738.792efbd1@fabiankeil.de> References: <20161128041847.GA65249@charmander> <20161128120046.GP54029@kib.kiev.ua> <20161128144135.10f93205@fabiankeil.de> <20161128160311.GQ54029@kib.kiev.ua> <20161128162240.GM99742@zxy.spb.ru> <20161129131738.792efbd1@fabiankeil.de> From: Warner Losh Date: Tue, 29 Nov 2016 08:33:24 -0700 X-Google-Sender-Auth: 5sdVCcT0EkLrAXpfeIFUHCLvtwI Message-ID: Subject: Re: FreeBSD 11 i386 disk deadlock (I think) (now with reproduction steps!) To: Fabian Keil Cc: David Cross , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Nov 2016 15:33:26 -0000 On Tue, Nov 29, 2016 at 5:17 AM, Fabian Keil wrote: > David Cross wrote: > >> I wouldn't call this a 'workaround', but the right answer. Something in >> the disk io path shouldn't be allocating memory out of the pool that can >> cause paging (since any of that could be IN the path for paging). It was >> what I assumed Fabian's proposed patch was. > > That's indeed what the patch does (for geli). I took a look at the patch. I think it's the wrong approach in the detail, though the general idea is good. It seems good enough to work around the problem. I think it would be better to have a pre-allocated area for one write of a certain size. We'd normally not use this at all. In the write path, we'd try to allocate what we need, and if that fails, we push down one write with the pre-allocated area. We queue further writes that fail to allocate the area they need. Once the one write that's using the pre-allocated area is done, we push down another one. This allows us to always make progress. Bonus points if you can do this only for the swapper. To do that latter bit requires help from the swapper. I've been working on some back-pressure into the VM layer to replace the current runningbuf limiter. Part of that work assigns a priority to the I/Os that's visible down the stack. That could be used to determine whether to dip into the reserve or not and may produce better results when we're not in a memory starved situation. It would be better to know you need to do this than to guess based on it being a onetime provider. Warner