From owner-freebsd-fs@FreeBSD.ORG Thu Apr 30 14:19:10 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DC30695 for ; Thu, 30 Apr 2015 14:19:10 +0000 (UTC) Received: from mail-wg0-x230.google.com (mail-wg0-x230.google.com [IPv6:2a00:1450:400c:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3641F1F46 for ; Thu, 30 Apr 2015 14:19:10 +0000 (UTC) Received: by wgen6 with SMTP id n6so64054671wge.3 for ; Thu, 30 Apr 2015 07:19:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=JAGmK8KPPYfCnQzUmEYPmNwlM0gCoRlqUMVnpfz5uWQ=; b=uudVFtuzNcJZOMT2SU8mFHRunlRIHVvP14tVzLoOGm/WmUS2nGMm6txckk7VLeXDHT odAIIoO8NQDDypwL9PrF/u2VOkPHpUHxSbTY13FXoErcMjlYeQFdWWKc/92cek6nOiqK ZeKVd6zaBTddCUjOq42+PjaSN9dOJiFqZK4Em9TdooB0By/bpAXQMGxlCPnaGw0rNmwc lcq22G42ByU2B2W1teVORJCwt4jC+C3ztMDCXnxPKtWjRHKqyonOmWfokAwEDjUT8IYP 4HlCOBHuKxNDbDHw10lpzYv2zdAuCO/T0HF0nFrJe8hcwW7WiUz9wgE+p8xO33VHbF2f 9kyA== X-Received: by 10.194.60.4 with SMTP id d4mr9253576wjr.72.1430403548662; Thu, 30 Apr 2015 07:19:08 -0700 (PDT) Received: from robert-notebook (ukc1-fw-1-v133-dip1.oracle.co.uk. [144.24.19.5]) by mx.google.com with ESMTPSA id di9sm2670737wib.16.2015.04.30.07.19.07 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Apr 2015 07:19:08 -0700 (PDT) Date: Thu, 30 Apr 2015 16:19:02 +0200 From: Robert David To: Martijn Cc: freebsd-fs@freebsd.org Subject: Re: ZFS stuck on write Message-ID: <20150430161902.4868094c@robert-notebook> In-Reply-To: <20150430134659.GA4950@kobol.office.hostage.nl> References: <20150430134659.GA4950@kobol.office.hostage.nl> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.27; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Apr 2015 14:19:10 -0000 Hi Martin, so few information provided to suggest anything. What about the pool size, free space? Some exceeding quotas? Features enabled (compress,dedup)? Zil, l2arc? Regards, Robert. On Thu, 30 Apr 2015 15:47:00 +0200 Martijn wrote: > Hi, > > I've been trying to get an important production machine stable again since > yesterday afternoon, but to no avail (so far). > > It seems ZFS is the problem on this box. Situation is as follows: > > - it used to be fbsd 8.3. after reading about deadlocks which have been fixed > in the meantime, i upgrade to 10.1, no change, did zpool upgrade, no change > and did zfs upgrade -a also no change... > > - Its a machine on which each user has a separate zfs filesystem with refquota > set. It also did periodic zfs snapshots, every hour (48), day (14) week (8) > month (24), which is way too much but at the time of setting it up i thought > it couldn't hurt. > > - After some usage the machine gets stuck when trying to write a file. The > process just stops and can't be killed. After some time the whole machine > used to become unresponsive in 8.3, but nice 10.1 i can still reach it, > although processes attempting to write get stuck for ever. > > - nothing scary shows in dmesg > > What can i do? The machine has 24GB of registered ECC ram (17GB free), its a > RAID-Z pool with 4 sata hdd's on a LSI SAS3442E-R (1068 chip) in IT mode. > > loader.conf: > > vfs.zfs.arc_max=8G > vfs.zfs.txg.timeout="5" > vfs.zfs.prefetch_disable="1" > vfs.zfs.vdev.min_pending="3" > vfs.zfs.vdev.max_pending="6" > vfs.zfs.txg.write_limit_override=1073741824 > > I've tried to copy the most important users to another machine but thats gonna > take a lot of time. There's 160 users (websites + mailboxes) on it. > > Any help would be much appreciated! > > Thanks in advance, > > Martijn