From owner-freebsd-fs@freebsd.org Sun Oct 11 14:00:18 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27FDBA1128C for ; Sun, 11 Oct 2015 14:00:18 +0000 (UTC) (envelope-from michael@ranner.eu) Received: from mail.azedo.at (mail.azedo.at [91.118.6.139]) by mx1.freebsd.org (Postfix) with ESMTP id A48ACD04 for ; Sun, 11 Oct 2015 14:00:17 +0000 (UTC) (envelope-from michael@ranner.eu) Received: from mail.azedo.at (mail.azedo.at [172.20.10.3]) by mail.azedo.at (Postfix) with ESMTP id 9C531A83018; Sun, 11 Oct 2015 16:00:16 +0200 (CEST) X-Virus-Scanned: amavisd-new at azedo.at Received: from mail.azedo.at ([172.20.10.3]) by mail.azedo.at (mail.azedo.at [172.20.10.3]) (amavisd-new, port 10024) with ESMTP id ksX_OmwxQSWM; Sun, 11 Oct 2015 16:00:02 +0200 (CEST) Received: from lynx.local (80-121-99-80.adsl.highway.telekom.at [80.121.99.80]) by mail.azedo.at (Postfix) with ESMTPSA id 9C158A83015; Sun, 11 Oct 2015 16:00:02 +0200 (CEST) Subject: Re: Zfs locking up process To: Steven Hartland , freebsd-fs@freebsd.org References: <561A2A4B.4080704@ranner.eu> <561A5445.40603@multiplay.co.uk> <561A660E.1020905@ranner.eu> <561A68EB.5080202@multiplay.co.uk> From: Michael Ranner X-Enigmail-Draft-Status: N1110 Message-ID: <561A6B62.5020905@ranner.eu> Date: Sun, 11 Oct 2015 16:00:02 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <561A68EB.5080202@multiplay.co.uk> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Oct 2015 14:00:18 -0000 Am 11.10.15 um 15:49 schrieb Steven Hartland: > > On 11/10/2015 14:37, Michael Ranner wrote: >> Am 11.10.15 um 14:21 schrieb Steven Hartland: >>> >>> >>> On 11/10/2015 10:22, Michael Ranner wrote: >>>> Am 07.10.15 um 17:11 schrieb Rajil Saraswat: >>>>> Hello >>>>> >>>>> I have server running Freenas 9.3 with a few jails. The machine >>>>> has two new >>>>> disks setup in mirror. I have a dataset (/mnt/tank/media) which is >>>>> shared >>>>> in two jails. >>>>> >>>>> Unfortunately, sometimes when I do a ls in a jail in the shared >>>>> directory I >>>>> see that the process just hangs. >>>>> >>>>> Today in the jail I did an 'su' and process just hung. On the >>>>> host if i do >>>>> ls /mnt/tank/media it also hangs. >>>>> >>>>> The su process (pid 77477) is taking up 100% cpu in the jail. It >>>>> seems that >>>>> zfs is holding up the process. Any idea what could be wrong? >>>>> >>>>> >>>> It is a known problem with ZFS and nullfs. I had no problems under >>>> FreeBSD 8 witch such a setup, but since FreeBSD 9 it is very >>>> unstable to >>>> mount_nullfs on ZFS. I experienced the same behaviour with Apache >>>> jails >>>> and PHP, mostly PHP running with 100% CPU inside the jail. >>> I'd have to disagree with this we have hundreds of machines on 10.1 >>> which uses nullfs every day and we've never seen a lockup. >>> >>> Given that do you have more information about this e.g. PR? >> There are some posts to freebsd-fs in 2014 like this: >> >> https://lists.freebsd.org/pipermail/freebsd-fs/2014-November/020482.html >> >> And an in depth insight von Andriy Gapon: >> >> https://lists.freebsd.org/pipermail/freebsd-fs/2014-September/020072.html >> >> The problem will become more frequently with heavy snapshot usage on >> the underlying ZFS datasets. > > There's been lots a movement in ZFS since 9.x so it would be good to > confirm what FreeBSD version the original poster is using. I suspect > it is 9.3 and if so the first action would be to check on latest 10 > release i.e. 10.2 at this time, to ensure its not already been addressed. > > Looking at the FreeNAS site FreeNAS-10 ALPHA is out which is based off > 10.2 so it would be worth testing that, given the announcement post > though this should be done with caution. > > Regards > Steve I have this problem (exact the same stack trace) in all 9.* and 10.1 releases, but the problem does not occur on all servers. Some systems run stable for 100eds of days and specific systems trigger the problem several times in one week. It seems, that some workload will trigger it, especially when snapshot operations are also in progress. Since I removed all nullfs mounts for my web datasets (Apache 2.2/2.4, PHP 5.*) the systems are rock stable. But fyi: I have no problems with nullfs mounts of the ports tree to my jails and building ports. I have/had also no problems on my poudriere build system, running poudriere since early days. -- Mit freundlichen Grüßen Ing. Michael Ranner GSM: +43 676 4155044 Mail: michael@ranner.eu WWW: http://www.azedo.at/