Date: Tue, 03 Sep 2013 18:22:39 +1000 From: Grant Gray <grant@grantgray.id.au> To: freebsd-fs@freebsd.org Subject: Re: ZFS livelock / deadlock on pure SSD pool Message-ID: <52259C4F.6020705@grantgray.id.au> In-Reply-To: <522599A9.9070107@grantgray.id.au> References: <522599A9.9070107@grantgray.id.au>
next in thread | previous in thread | raw e-mail | index | archive | help
I forgot to mention the device list: <TEAC DW-224SL-R 1.0B> at scbus0 target 0 lun 0 (pass0,cd0) <DELL MD1000 A.04> at scbus3 target 39 lun 0 (pass1,ses0) <ATA Crucial_CT960M50 MU02> at scbus3 target 40 lun 0 (pass2,da0) <ATA Crucial_CT960M50 MU02> at scbus3 target 41 lun 0 (pass3,da1) <ATA Crucial_CT960M50 MU02> at scbus3 target 42 lun 0 (pass4,da2) <ATA Crucial_CT960M50 MU02> at scbus3 target 43 lun 0 (pass5,da3) <ATA WDC WD2002FAEX-0 1D05> at scbus3 target 44 lun 0 (pass6,da4) <ATA WDC WD20EARX-00P AB51> at scbus3 target 45 lun 0 (pass7,da5) <ATA WDC WD20EARX-00P AB51> at scbus3 target 46 lun 0 (pass8,da6) <ATA WDC WD20EARX-00P AB51> at scbus3 target 47 lun 0 (pass9,da7) <ATA WDC WD20EARX-00P AB51> at scbus3 target 48 lun 0 (pass10,da8) <ATA WDC WD2002FAEX-0 1D05> at scbus3 target 49 lun 0 (pass11,da9) <ATA WDC WD20EARX-00P AB51> at scbus3 target 50 lun 0 (pass12,da10) <ATA WDC WD20EARX-00P AB51> at scbus3 target 51 lun 0 (pass13,da11) <ATA WDC WD20EARX-00P AB51> at scbus3 target 52 lun 0 (pass14,da12) <ATA WDC WD20EARX-00P AB51> at scbus3 target 53 lun 0 (pass15,da13) <ATA WDC WD20EARX-00P AB51> at scbus3 target 54 lun 0 (pass16,da14) <FUJITSU MBB2147RCSUN146G 0505> at scbus4 target 0 lun 0 (pass17,da15) <FUJITSU MBB2147RCSUN146G 0505> at scbus4 target 1 lun 0 (pass18,da16) <ATA INTEL SSDSC2CW12 400i> at scbus4 target 2 lun 0 (pass19,da17) <ATA INTEL SSDSC2CW12 400i> at scbus4 target 3 lun 0 (pass20,da18) On 09/03/2013 06:11 PM, Grant Gray wrote: > Hello All, > > I have been experiencing a ZFS livelock on a 9.1 system since > introducing pools containing only SSDs. The livelock occurs typically > every 1-2 days, sometimes as much as twice a day. > > ZFS filesystems: > http://pastebin.com/raw.php?i=svTZRd7m > > The pool configuration is as follows: > http://pastebin.com/raw.php?i=KAdSGWu4 > > /boot/loader.conf: > http://pastebin.com/raw.php?i=J1cZNPjS > <http://pastebin.com/raw.php?i=J1cZNPjS> > There were a couple of livelock issues associated with 9.1 (one in > ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to > 9.2RC3, however the problem persists. When the system has locked, it > can still be pinged and socket connections can be made (SSH begins > handshake for example, but doesn't get as far as prompting for password). > > Some details: > * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, > * Regular (hourly) cron jobs that traverse at least one filesystem of > tens of thousands of files, > * NFS exports of some ZFS filesystems, > * iSCSI exports via istgt of zvols, > * Host controller is LSI 3801E (IT) with latest firmware, > * Storage array is Dell MD1000 with latest firmware, > * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, > * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). > > > I haven't yet enabled the kernel debugger to get a stack trace/lock > status, but procstat -kk -a is here: > http://pastebin.com/raw.php?i=SYhmyhGj > > Once livelock occurs, any ZFS command hangs, and it appears any > command that doesn't happen to be in cache may also hang. > > Any suggestions are warmly welcomed! > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52259C4F.6020705>