From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 18 19:46:31 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 719846CC
 for <freebsd-fs@freebsd.org>; Mon, 18 Aug 2014 19:46:31 +0000 (UTC)
Received: from caida.org (rommie.caida.org [192.172.226.78])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5D7743E4B
 for <freebsd-fs@freebsd.org>; Mon, 18 Aug 2014 19:46:30 +0000 (UTC)
Message-ID: <53F25402.1020907@caida.org>
Date: Mon, 18 Aug 2014 12:29:06 -0700
From: Daniel Andersen <dea@caida.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Process enters unkillable state and somewhat wedges zfs
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Aug 2014 19:46:31 -0000

We are currently experiencing a strange problem that sort of locks up one of our zfs pools.  This is on a FreeBSD 10
machine.  Let me give a rough layout of our system to better describe what is happening:

We have two pools, tank and work.  Both are mounted as /tank and /work respectively.  Within these pools, we have a
variety of partitions.  Each of these partitions is then nullfs mounted into other partitions in an effort to present
a common directory structure for users.  Below is an example:

/tank/a -> /data/a
/tank/a -> /export/a
/tank/b -> /data/b
/work/home -> /home

etc.

Now, occasionally, something goes horribly wrong with a process ( or sometimes one thread within a process ).  It enters
a state where it is running, pegging a CPU at 100%, and is unkillable.  This process, as I understand it, is
attempting to access data on both pools, but only through the nullfs mounts.

Now, when this process enters the above mentioned state, the /tank pool becomes inaccessible.. any process attempting
to touch it enters the traditional 'D' state and itself becomes unkillable.  However... you can *still* access all the
data through the nullfs mounts.  So, while 'ls /tank/a' wedges, 'ls /data/a' works fine.

Typically, this happens when the machine is under high load.  Arc memory usage is often at 140+GB ( out of 192GB total )
 It has happened under low load once... but I suspect there was still substantial I/O load at the time, as we had been
doing many benchmarks trying to trigger this problem, and likely the cache was still flushing to disk.

Initially, we thought this was triggered by a process attempting to dump core, as all processes that originally wedged
were such.. however, after disabling core dumps, we just had a case where 'sudo -u user su - user' wedged.

This has me baffled.  If anyone has any hints as to where to even start debugging this, I'd appreciate it greatly.
For now, I've tuned down the arc max memory usage.. just as a guess.  I saw in another thread here some bugs about
FreeBSD not giving back memory from ARC correctly, when needed.  I don't know if this could cause a process to enter
this state, but figured it was worth a try.

Many thanks,
Daniel Andersen