From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 08:22:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EEB99A48 for ; Tue, 3 Sep 2013 08:22:41 +0000 (UTC) (envelope-from grant@grantgray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8783B2341 for ; Tue, 3 Sep 2013 08:22:41 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id 8381936BD12 for ; Tue, 3 Sep 2013 18:22:40 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BlTZAKFjs11z for ; Tue, 3 Sep 2013 18:22:40 +1000 (EST) Received: from localhost.localdomain (c27-253-54-200.thoms4.vic.optusnet.com.au [27.253.54.200]) by mail.grantgray.id.au (Postfix) with ESMTPSA id EF0E136BD10 for ; Tue, 3 Sep 2013 18:22:39 +1000 (EST) Message-ID: <52259C4F.6020705@grantgray.id.au> Date: Tue, 03 Sep 2013 18:22:39 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> In-Reply-To: <522599A9.9070107@grantgray.id.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 08:22:42 -0000 I forgot to mention the device list: at scbus0 target 0 lun 0 (pass0,cd0) at scbus3 target 39 lun 0 (pass1,ses0) at scbus3 target 40 lun 0 (pass2,da0) at scbus3 target 41 lun 0 (pass3,da1) at scbus3 target 42 lun 0 (pass4,da2) at scbus3 target 43 lun 0 (pass5,da3) at scbus3 target 44 lun 0 (pass6,da4) at scbus3 target 45 lun 0 (pass7,da5) at scbus3 target 46 lun 0 (pass8,da6) at scbus3 target 47 lun 0 (pass9,da7) at scbus3 target 48 lun 0 (pass10,da8) at scbus3 target 49 lun 0 (pass11,da9) at scbus3 target 50 lun 0 (pass12,da10) at scbus3 target 51 lun 0 (pass13,da11) at scbus3 target 52 lun 0 (pass14,da12) at scbus3 target 53 lun 0 (pass15,da13) at scbus3 target 54 lun 0 (pass16,da14) at scbus4 target 0 lun 0 (pass17,da15) at scbus4 target 1 lun 0 (pass18,da16) at scbus4 target 2 lun 0 (pass19,da17) at scbus4 target 3 lun 0 (pass20,da18) On 09/03/2013 06:11 PM, Grant Gray wrote: > Hello All, > > I have been experiencing a ZFS livelock on a 9.1 system since > introducing pools containing only SSDs. The livelock occurs typically > every 1-2 days, sometimes as much as twice a day. > > ZFS filesystems: > http://pastebin.com/raw.php?i=svTZRd7m > > The pool configuration is as follows: > http://pastebin.com/raw.php?i=KAdSGWu4 > > /boot/loader.conf: > http://pastebin.com/raw.php?i=J1cZNPjS > > There were a couple of livelock issues associated with 9.1 (one in > ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to > 9.2RC3, however the problem persists. When the system has locked, it > can still be pinged and socket connections can be made (SSH begins > handshake for example, but doesn't get as far as prompting for password). > > Some details: > * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, > * Regular (hourly) cron jobs that traverse at least one filesystem of > tens of thousands of files, > * NFS exports of some ZFS filesystems, > * iSCSI exports via istgt of zvols, > * Host controller is LSI 3801E (IT) with latest firmware, > * Storage array is Dell MD1000 with latest firmware, > * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, > * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). > > > I haven't yet enabled the kernel debugger to get a stack trace/lock > status, but procstat -kk -a is here: > http://pastebin.com/raw.php?i=SYhmyhGj > > Once livelock occurs, any ZFS command hangs, and it appears any > command that doesn't happen to be in cache may also hang. > > Any suggestions are warmly welcomed! > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"