From owner-freebsd-stable@FreeBSD.ORG Wed Jan 30 08:31:47 2013 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E5604203; Wed, 30 Jan 2013 08:31:47 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AA2AF75D; Wed, 30 Jan 2013 08:31:46 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA23216; Wed, 30 Jan 2013 10:31:41 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U0T53-000Ndp-HH; Wed, 30 Jan 2013 10:31:41 +0200 Message-ID: <5108DA6B.80600@FreeBSD.org> Date: Wed, 30 Jan 2013 10:31:39 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: Rick Macklem Subject: Re: NFS-exported ZFS instability References: <1716447362.2485682.1359500788054.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1716447362.2485682.1359500788054.JavaMail.root@erie.cs.uoguelph.ca> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: kostikbel@gmail.com, alc@FreeBSD.org, stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 08:31:48 -0000 on 30/01/2013 01:06 Rick Macklem said the following: > Andriy Gapon wrote: >> on 29/01/2013 23:44 Hiroki Sato said the following: >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt >> >> I recognize here a ZFS ARC deadlock that should have been prevented by >> r241773 >> and its MFCs (r242858 for 9, r242859 for 8). >> > Unfortunately, pool-20130130-info.txt shows a kernel built from r244417, > unless I somehow misread it. You are right. I slightly misdiagnosed the problem - it's not the same, but a slightly different problem. So it has "almost the same" cause, but r241773 didn't handle this situation. Basically: - a thread goes into ARC, acquires some ARC lock and then calls malloc(M_WAITOK) - there is a page shortage, so the thread ends up in VM_WAIT() waiting on pagedaemon - pagedaemon synchronously invokes lowmem hook - the ARC hook sleeps waiting on ARC reclaim thread to make a pass - ARC reclaim thread is blocked on the ARC lock held by the original thread My conclusion: ARC lowmem hook should never wait on ARC reclaim thread. At least as long as the ARC code calls malloc(M_WAITOK) while holding locks. Perhaps the root cause here is that we treat both KM_PUSHPAGE and KM_SLEEP as M_WAITOK. We do not seem to have an equivalent of KM_PUSHPAGE? Perhaps resurrected M_USE_RESERVE could serve this role? Quote: A small pool of reserved memory is available to allow the system to progress toward the goal of freeing additional memory while in a low memory situation. The KM_PUSHPAGE flag enables use of this reserved memory pool on an allocation. This flag can be used by drivers that implement strategy(9E) on memory allocations associated with a single I/O operation. The driver guarantees that the I/O operation will complete (or timeout) and, on completion, that the memory will be returned. The KM_PUSHPAGE flag should be used only in kmem_cache_alloc() calls. All allocations from a given cache should be consistent in their use of the flag. A driver that adheres to these restrictions can guarantee progress in a low memory situation without resorting to complex private allocation and queuing schemes. If KM_PUSHPAGE is specified, KM_SLEEP can also be used without causing deadlock. But please note how the Solaris API allows to use KM_PUSHPAGE with KM_SLEEP, not sure what's going on under the hood in that case. >> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and tid >> 100639 >> (nfsd in kmem_back). >> >> -- >> Andriy Gapon -- Andriy Gapon