From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 16:14:30 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53232106566B; Thu, 4 Oct 2012 16:14:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7020D8FC08; Thu, 4 Oct 2012 16:14:28 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA19407; Thu, 04 Oct 2012 19:14:19 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506DB5DB.7080302@FreeBSD.org> Date: Thu, 04 Oct 2012 19:14:19 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Nikolay Denev , freebsd-fs , Pawel Jakub Dawidek References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> <506C4049.4040100@FreeBSD.org> <506D81A7.8030506@FreeBSD.org> In-Reply-To: <506D81A7.8030506@FreeBSD.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 16:14:30 -0000 on 04/10/2012 15:31 Andriy Gapon said the following: > > [restoring cc to fs@] > > on 04/10/2012 14:32 Nikolay Denev said the following: >> I have procstat only for the nfsd threads from the moment of the IO hang. >> And this is the only one with "arc" : >> >> 1422 138630 nfsd nfsd: service mi_switch+0x186 >> sleepq_wait+0x42 _sleep+0x390 arc_lowmem+0x77 kmem_malloc+0xc1 >> uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec >> arc_read+0x93 dbuf_read+0x452 dmu_buf_hold_array_by_dnode+0x16b >> dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x3e8 >> nfsvno_read+0x2e5 nfsrvd_read+0x3ff nfsrvd_dorpc+0x3c0 > > Oh, very important stack trace. > > Earlier Nikolay Denev said the following: >> PID TID COMM TDNAME KSTACK >> 7 100192 zfskern arc_reclaim_thre mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x428 >> _sx_xlock+0x51 arc_buf_remove_ref+0x8a dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 >> dbuf_do_evict+0x53 arc_do_user_evicts+0xb4 arc_reclaim_thread+0x263 fork_exit+0x11f >> fork_trampoline+0xe > > To me this looks like a deadlock caused by a FreeBSD add-on to ZFS: arc_lowmem > handler. > I think that this is what happens: > The nfsd thread does read, arc_read_nolock finds a buffer in a ghost cache and > calls arc_get_data_buf while holding a hash_lock (one of buffer hash locks). > arc_get_data_buf needs to allocate some memory and, as luck would have it, there > is a memory shortage. Low memory handlers are invoked (directly) and one of them > is arc_lowmem. arc_lowmem simply kicks arc_reclaim_thread to do its job and then > loops sleep-waiting until memory shortage is less severe. arc_reclaim_thread > tries to evict some buffers and, as luck would have it again, it attempts to evict > either the same buffer or, most likely, a different buffer that hashes to the same > lock. > So arc_reclaim_thread is blocked on the arc buffer lock. While the nfsd thread > holds the lock, but waits in arc_lowmem for arc_reclaim_thread to make progress. > > Eventually the held lock stalls other threads that attempt to grab it, the stall > propagates to txg_sync_thread threads and all ZFS I/O stops. > BTW, one thing to note here is that the lowmem hook was invoked because of KVA space shortage, not because of page shortage. >From practical point of view this may mean that having sufficient KVA size may help to not run into this deadlock. >From programming point of view I am tempted to let arc_lowmem block only if curproc == pageproc. That should both handle the case where blocking is most needed and should prevent the deadlock described above. -- Andriy Gapon