From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 08:20:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4589F917 for ; Tue, 3 Sep 2013 08:20:44 +0000 (UTC) (envelope-from grant@grantgray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id D192B2323 for ; Tue, 3 Sep 2013 08:20:42 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id C77AA36BD12 for ; Tue, 3 Sep 2013 18:11:22 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8s0TJ2BjRbqu for ; Tue, 3 Sep 2013 18:11:21 +1000 (EST) Received: from localhost.localdomain (c27-253-54-200.thoms4.vic.optusnet.com.au [27.253.54.200]) by mail.grantgray.id.au (Postfix) with ESMTPSA id 7A8DA36BD10 for ; Tue, 3 Sep 2013 18:11:21 +1000 (EST) Message-ID: <522599A9.9070107@grantgray.id.au> Date: Tue, 03 Sep 2013 18:11:21 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: ZFS livelock / deadlock on pure SSD pool Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 08:20:44 -0000 Hello All, I have been experiencing a ZFS livelock on a 9.1 system since introducing pools containing only SSDs. The livelock occurs typically every 1-2 days, sometimes as much as twice a day. ZFS filesystems: http://pastebin.com/raw.php?i=svTZRd7m The pool configuration is as follows: http://pastebin.com/raw.php?i=KAdSGWu4 /boot/loader.conf: http://pastebin.com/raw.php?i=J1cZNPjS There were a couple of livelock issues associated with 9.1 (one in ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to 9.2RC3, however the problem persists. When the system has locked, it can still be pinged and socket connections can be made (SSH begins handshake for example, but doesn't get as far as prompting for password). Some details: * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, * Regular (hourly) cron jobs that traverse at least one filesystem of tens of thousands of files, * NFS exports of some ZFS filesystems, * iSCSI exports via istgt of zvols, * Host controller is LSI 3801E (IT) with latest firmware, * Storage array is Dell MD1000 with latest firmware, * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). I haven't yet enabled the kernel debugger to get a stack trace/lock status, but procstat -kk -a is here: http://pastebin.com/raw.php?i=SYhmyhGj Once livelock occurs, any ZFS command hangs, and it appears any command that doesn't happen to be in cache may also hang. Any suggestions are warmly welcomed!