Date: Tue, 03 Sep 2013 18:11:21 +1000 From: Grant Gray <grant@grantgray.id.au> To: freebsd-fs@freebsd.org Subject: ZFS livelock / deadlock on pure SSD pool Message-ID: <522599A9.9070107@grantgray.id.au>
next in thread | raw e-mail | index | archive | help
Hello All, I have been experiencing a ZFS livelock on a 9.1 system since introducing pools containing only SSDs. The livelock occurs typically every 1-2 days, sometimes as much as twice a day. ZFS filesystems: http://pastebin.com/raw.php?i=svTZRd7m The pool configuration is as follows: http://pastebin.com/raw.php?i=KAdSGWu4 /boot/loader.conf: http://pastebin.com/raw.php?i=J1cZNPjS <http://pastebin.com/raw.php?i=J1cZNPjS> There were a couple of livelock issues associated with 9.1 (one in ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to 9.2RC3, however the problem persists. When the system has locked, it can still be pinged and socket connections can be made (SSH begins handshake for example, but doesn't get as far as prompting for password). Some details: * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, * Regular (hourly) cron jobs that traverse at least one filesystem of tens of thousands of files, * NFS exports of some ZFS filesystems, * iSCSI exports via istgt of zvols, * Host controller is LSI 3801E (IT) with latest firmware, * Storage array is Dell MD1000 with latest firmware, * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). I haven't yet enabled the kernel debugger to get a stack trace/lock status, but procstat -kk -a is here: http://pastebin.com/raw.php?i=SYhmyhGj Once livelock occurs, any ZFS command hangs, and it appears any command that doesn't happen to be in cache may also hang. Any suggestions are warmly welcomed!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?522599A9.9070107>