From owner-freebsd-fs@FreeBSD.ORG Thu Apr 30 13:56:09 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A7B9BFE6 for ; Thu, 30 Apr 2015 13:56:09 +0000 (UTC) Received: from smtp.hostage.nl (smtp.hostage.nl [109.72.93.221]) by mx1.freebsd.org (Postfix) with ESMTP id 758F81CDE for ; Thu, 30 Apr 2015 13:56:09 +0000 (UTC) Date: Thu, 30 Apr 2015 15:47:00 +0200 From: Martijn To: freebsd-fs@freebsd.org Subject: ZFS stuck on write Message-ID: <20150430134659.GA4950@kobol.office.hostage.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Apr 2015 13:56:09 -0000 Hi, I've been trying to get an important production machine stable again since yesterday afternoon, but to no avail (so far). It seems ZFS is the problem on this box. Situation is as follows: - it used to be fbsd 8.3. after reading about deadlocks which have been fixed in the meantime, i upgrade to 10.1, no change, did zpool upgrade, no change and did zfs upgrade -a also no change... - Its a machine on which each user has a separate zfs filesystem with refquota set. It also did periodic zfs snapshots, every hour (48), day (14) week (8) month (24), which is way too much but at the time of setting it up i thought it couldn't hurt. - After some usage the machine gets stuck when trying to write a file. The process just stops and can't be killed. After some time the whole machine used to become unresponsive in 8.3, but nice 10.1 i can still reach it, although processes attempting to write get stuck for ever. - nothing scary shows in dmesg What can i do? The machine has 24GB of registered ECC ram (17GB free), its a RAID-Z pool with 4 sata hdd's on a LSI SAS3442E-R (1068 chip) in IT mode. loader.conf: vfs.zfs.arc_max=8G vfs.zfs.txg.timeout="5" vfs.zfs.prefetch_disable="1" vfs.zfs.vdev.min_pending="3" vfs.zfs.vdev.max_pending="6" vfs.zfs.txg.write_limit_override=1073741824 I've tried to copy the most important users to another machine but thats gonna take a lot of time. There's 160 users (websites + mailboxes) on it. Any help would be much appreciated! Thanks in advance, Martijn -- Hostage Keizersgracht 316 1016 EZ Amsterdam tel: +31 (0)20 4632 303 http://www.hostage.nl