Date: Wed, 30 Jan 2013 10:46:16 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Andriy Gapon <avg@FreeBSD.org> Cc: kostikbel@gmail.com, alc@FreeBSD.org, stable@FreeBSD.org Subject: Re: NFS-exported ZFS instability Message-ID: <1767444640.2505809.1359560776337.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <5108DA6B.80600@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Andriy Gapon wrote: > on 30/01/2013 01:06 Rick Macklem said the following: > > Andriy Gapon wrote: > >> on 29/01/2013 23:44 Hiroki Sato said the following: > >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt > >>> http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt > >> > >> I recognize here a ZFS ARC deadlock that should have been prevented > >> by > >> r241773 > >> and its MFCs (r242858 for 9, r242859 for 8). > >> > > Unfortunately, pool-20130130-info.txt shows a kernel built from > > r244417, > > unless I somehow misread it. > > You are right. I slightly misdiagnosed the problem - it's not the > same, but a > slightly different problem. So it has "almost the same" cause, but > r241773 > didn't handle this situation. > > Basically: > - a thread goes into ARC, acquires some ARC lock and then calls > malloc(M_WAITOK) > - there is a page shortage, so the thread ends up in VM_WAIT() waiting > on pagedaemon > - pagedaemon synchronously invokes lowmem hook > - the ARC hook sleeps waiting on ARC reclaim thread to make a pass > - ARC reclaim thread is blocked on the ARC lock held by the original > thread > > My conclusion: ARC lowmem hook should never wait on ARC reclaim > thread. At > least as long as the ARC code calls malloc(M_WAITOK) while holding > locks. > > Perhaps the root cause here is that we treat both KM_PUSHPAGE and > KM_SLEEP as > M_WAITOK. We do not seem to have an equivalent of KM_PUSHPAGE? > Perhaps resurrected M_USE_RESERVE could serve this role? > Good work figuring this out! Obviously, better folk that I will have to figure out how to fix this. Good luck with it, rick ps: Having some "special" place malloc() can go for critical allocations, sounds like a good plan to me. Possibly have malloc() follow the M_NOWAIT path and then go to this area when M_NOWAIT fails to allocate? > Quote: > A small pool of reserved memory is available to allow the system to > progress > toward the goal of freeing additional memory while in a low memory > situation. > The KM_PUSHPAGE flag enables use of this reserved memory pool on an > allocation. > This flag can be used by drivers that implement strategy(9E) on memory > allocations associated with a single I/O operation. The driver > guarantees that > the I/O operation will complete (or timeout) and, on completion, that > the memory > will be returned. The KM_PUSHPAGE flag should be used only in > kmem_cache_alloc() > calls. All allocations from a given cache should be consistent in > their use of > the flag. A driver that adheres to these restrictions can guarantee > progress in > a low memory situation without resorting to complex private allocation > and > queuing schemes. If KM_PUSHPAGE is specified, KM_SLEEP can also be > used without > causing deadlock. > > > But please note how the Solaris API allows to use KM_PUSHPAGE with > KM_SLEEP, not > sure what's going on under the hood in that case. > > >> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and > >> tid > >> 100639 > >> (nfsd in kmem_back). > >> > >> -- > >> Andriy Gapon > > > -- > Andriy Gapon > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1767444640.2505809.1359560776337.JavaMail.root>