From owner-freebsd-stable@FreeBSD.ORG Wed Jan 30 15:47:26 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 61287194; Wed, 30 Jan 2013 15:47:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 06A14212; Wed, 30 Jan 2013 15:47:25 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAOA/CVGDaFvO/2dsb2JhbABFFoYyuFpzgh4BAQEDAQEBASArIAsFFg4KAgINGQIpAQkmBggHBAEcBIdqBgyvI5JfgSOLfoJYgRMDiGGKfoIugRyPL4MVgVE1 X-IronPort-AV: E=Sophos;i="4.84,569,1355115600"; d="scan'208";a="14255407" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Jan 2013 10:46:16 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5BF2EB3F36; Wed, 30 Jan 2013 10:46:16 -0500 (EST) Date: Wed, 30 Jan 2013 10:46:16 -0500 (EST) From: Rick Macklem To: Andriy Gapon Message-ID: <1767444640.2505809.1359560776337.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <5108DA6B.80600@FreeBSD.org> Subject: Re: NFS-exported ZFS instability MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: kostikbel@gmail.com, alc@FreeBSD.org, stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 15:47:26 -0000 Andriy Gapon wrote: > on 30/01/2013 01:06 Rick Macklem said the following: > > Andriy Gapon wrote: > >> on 29/01/2013 23:44 Hiroki Sato said the following: > >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt > >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt > >> > >> I recognize here a ZFS ARC deadlock that should have been prevented > >> by > >> r241773 > >> and its MFCs (r242858 for 9, r242859 for 8). > >> > > Unfortunately, pool-20130130-info.txt shows a kernel built from > > r244417, > > unless I somehow misread it. > > You are right. I slightly misdiagnosed the problem - it's not the > same, but a > slightly different problem. So it has "almost the same" cause, but > r241773 > didn't handle this situation. > > Basically: > - a thread goes into ARC, acquires some ARC lock and then calls > malloc(M_WAITOK) > - there is a page shortage, so the thread ends up in VM_WAIT() waiting > on pagedaemon > - pagedaemon synchronously invokes lowmem hook > - the ARC hook sleeps waiting on ARC reclaim thread to make a pass > - ARC reclaim thread is blocked on the ARC lock held by the original > thread > > My conclusion: ARC lowmem hook should never wait on ARC reclaim > thread. At > least as long as the ARC code calls malloc(M_WAITOK) while holding > locks. > > Perhaps the root cause here is that we treat both KM_PUSHPAGE and > KM_SLEEP as > M_WAITOK. We do not seem to have an equivalent of KM_PUSHPAGE? > Perhaps resurrected M_USE_RESERVE could serve this role? > Good work figuring this out! Obviously, better folk that I will have to figure out how to fix this. Good luck with it, rick ps: Having some "special" place malloc() can go for critical allocations, sounds like a good plan to me. Possibly have malloc() follow the M_NOWAIT path and then go to this area when M_NOWAIT fails to allocate? > Quote: > A small pool of reserved memory is available to allow the system to > progress > toward the goal of freeing additional memory while in a low memory > situation. > The KM_PUSHPAGE flag enables use of this reserved memory pool on an > allocation. > This flag can be used by drivers that implement strategy(9E) on memory > allocations associated with a single I/O operation. The driver > guarantees that > the I/O operation will complete (or timeout) and, on completion, that > the memory > will be returned. The KM_PUSHPAGE flag should be used only in > kmem_cache_alloc() > calls. All allocations from a given cache should be consistent in > their use of > the flag. A driver that adheres to these restrictions can guarantee > progress in > a low memory situation without resorting to complex private allocation > and > queuing schemes. If KM_PUSHPAGE is specified, KM_SLEEP can also be > used without > causing deadlock. > > > But please note how the Solaris API allows to use KM_PUSHPAGE with > KM_SLEEP, not > sure what's going on under the hood in that case. > > >> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and > >> tid > >> 100639 > >> (nfsd in kmem_back). > >> > >> -- > >> Andriy Gapon > > > -- > Andriy Gapon > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"