From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 10:36:09 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 04A7C958; Tue, 3 Sep 2013 10:36:09 +0000 (UTC) (envelope-from grant@gray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id B9663234B; Tue, 3 Sep 2013 10:36:07 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id 3207A37BA59; Tue, 3 Sep 2013 20:36:05 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iMz1y0UZkFv8; Tue, 3 Sep 2013 20:36:04 +1000 (EST) Received: from [192.168.1.159] (musicm2.lnk.telstra.net [110.142.98.231]) by mail.grantgray.id.au (Postfix) with ESMTPSA id 9B53637BA44; Tue, 3 Sep 2013 20:36:03 +1000 (EST) Message-ID: <5225BB8C.5050802@gray.id.au> Date: Tue, 03 Sep 2013 20:35:56 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> <5225AB77.9020208@FreeBSD.org> In-Reply-To: <5225AB77.9020208@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Grant Gray X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 10:36:09 -0000 On 3/09/2013 7:27 PM, Andriy Gapon wrote: > on 03/09/2013 11:11 Grant Gray said the following: >> I haven't yet enabled the kernel debugger to get a stack trace/lock status, but >> procstat -kk -a is here: >> http://pastebin.com/raw.php?i=SYhmyhGj > I believe that this another ARC deadlock triggered by low memory condition. > This time it seems to be FreeBSD-specific too: > > 6 100059 zfskern arc_reclaim_thre mi_switch+0x194 sleepq_wait+0x42 > _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_buf_remove_ref+0x8a > dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 dbuf_do_evict+0x53 > arc_do_user_evicts+0xe2 arc_reclaim_thread+0x264 fork_exit+0x11f fork_trampoline+0xe > > 5338 102410 vorbisgain - mi_switch+0x194 sleepq_wait+0x42 > _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_lowmem+0x38 kmem_malloc+0xb0 > uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0x1f4 arc_read+0x225 > dbuf_read+0x445 dmu_buf_hold_array_by_dnode+0x168 dmu_buf_hold_array+0x67 > dmu_read_uio+0x3f zfs_freebsd_read+0x483 VOP_READ_APV+0x6e vn_read+0xed > vn_io_fault+0x90 > > Thread 100059 acquired arc_reclaim_thr_lock before calling arc_do_user_evicts > and now it wants to take a buf header hash lock. > Thread 102410 acquired the hash lock in arc_read, then it got into arc_lowmem > because of a memory allocation problem (and M_WAIT flag) and now it wants to > take arc_reclaim_thr_lock. > > A classic deadlock. Thanks for the feedback. Do you think it may be triggered when the ARC is evicting pages because it is full, or a genuine low-memory case? The system has 32GB of RAM, of which the ARC is typically about 24G (I think).