From owner-freebsd-fs@FreeBSD.ORG  Fri Mar 20 00:16:25 2015
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CCC41B7A
 for <freebsd-fs@freebsd.org>; Fri, 20 Mar 2015 00:16:25 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8580AF00
 for <freebsd-fs@freebsd.org>; Fri, 20 Mar 2015 00:16:25 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id t2K0GNmi026001
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA)
 for <freebsd-fs@freebsd.org>; Thu, 19 Mar 2015 20:16:23 -0400 (EDT)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id t2K0GNci025998;
 Thu, 19 Mar 2015 20:16:23 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21771.26327.65535.250135@khavrinen.csail.mit.edu>
Date: Thu, 19 Mar 2015 20:16:23 -0400
From: Garrett Wollman <wollman@csail.mit.edu>
To: freebsd-fs@freebsd.org
Subject: Low-vnode deadlock
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 19 Mar 2015 20:16:23 -0400 (EDT)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Mar 2015 00:16:25 -0000

As I've previously posted, I've been doing some testing with the SPEC
SFS 2014 benchmark.  One of the workloads, SWBUILD, is intended to be
"metadata intensive".  While watching it in operation the other day, I
noticed that the vnlru kthread ends up taking a large amount of CPU,
indicating that the system is recycling vnodes at a very high rate.
In previous benchmark runs, I've also found that this workload tends
to deadlock the machine, although I haven't identified exactly how.
Usually this deadlock occurs around a load value ("business metric")
of 40 to 50 in the benchmark, and even when there is no deadlock, the
benchmark run is counted as a failure as the system can't maintain the
required op rate.

As a test, I increased kern.maxvnodes to 20 million.  While vnlru
still gets substantial CPU, and the system is thrashing like crazy,
it's still able to successfully complete benchmark runs without either
deadlock or missing the iops target, at least up to a load value of
65.  I'm still trying to find the point at which it falls over under
this configuration.  The system 5-minute load average peaks over 100
while the benchmark is running.  (There are 5 benchmark processes for
each unit of load, but they sleep to maintain the desired operation
rate.)

I will be interested to see how much of an effect this has when I move
from benchmarking the server itself to running benchmarks over NFS,
and the benchmark processes are no longer competing with the rest of
the system for main memory.

Ultimately these results will be published in some forum, but I
haven't figured out exactly where yet.

-GAWollman