From owner-freebsd-fs@freebsd.org  Wed Dec 21 06:24:19 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 171BAC8AC14;
 Wed, 21 Dec 2016 06:24:19 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E2BFF1297;
 Wed, 21 Dec 2016 06:24:18 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.15.2/8.15.2) with ESMTP id uBL6OGbG006929;
 Wed, 21 Dec 2016 01:24:16 -0500 (EST)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.15.2/8.14.4/Submit) id uBL6OGsJ006928;
 Wed, 21 Dec 2016 01:24:16 -0500 (EST) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <22618.8208.273426.702678@hergotha.csail.mit.edu>
Date: Wed, 21 Dec 2016 01:24:16 -0500
From: Garrett Wollman <wollman@bimajority.org>
To: freebsd-fs@freebsd.org
Cc: freebsd-stable@freebsd.org, rmacklem@freebsd.org
Subject: NFS server hang with backing store on ZFS and quota nearly exhausted
X-Mailer: VM 8.2.0b under 25.1.1 (amd64-portbld-freebsd10.3)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.1
 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 21 Dec 2016 01:24:17 -0500 (EST)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,
 HEADER_FROM_DIFFERENT_DOMAINS autolearn=disabled version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
 hergotha.csail.mit.edu
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Dec 2016 06:24:19 -0000

I've opened a bug about this before, which I can't cite by number
because bugzilla appears to be down at the moment.  But I had this
problem recur tonight under otherwise idle conditions, so I was able
to get a set of kernel stacks without any confounding RPC activity
going on.  This is on 10.2; we're not scheduled to take these servers
to 10.3 until next week.

Here's the "procstat -kk" output.

  PID    TID COMM             TDNAME           KSTACK                       
 1055 101965 nfsd             -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x357 Xfast_syscall+0xfb 
 1058 101012 nfsd             nfsd: service    mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a svc_run_internal+0x8be svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 

[Threads with the stack trace above are simply idle and waiting for
incoming requests, and I've deleted the other 5 of them.]

 1058 101688 nfsd             nfsd: service    mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _cv_timedwait_sig_sbt+0x18b svc_run_internal+0x4bd svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 

[Not sure what these threads are doing: obviously they are waiting for
a condvar, but at a different spot in svc_run_internal().  I've
deleted the other 7 of them.]

 1058 101720 nfsd             nfsd: service    mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_wait_open+0x85 dmu_tx_wait+0x2ac dmu_tx_assign+0x48 zfs_freebsd_write+0x544 VOP_WRITE_APV+0x149 nfsvno_write+0x13e nfsrvd_write+0x496 nfsrvd_dorpc+0x6f1 nfssvc_program+0x54e svc_run_internal+0xd7b svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 
 1058 102015 nfsd             nfsd: master     mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_wait_open+0x85 dmu_tx_wait+0x2ac dmu_tx_assign+0x48 zfs_freebsd_write+0x544 VOP_WRITE_APV+0x149 nfsvno_write+0x13e nfsrvd_write+0x496 nfsrvd_dorpc+0x6f1 nfssvc_program+0x54e svc_run_internal+0xd7b svc_run+0x1de nfsrvd_nfsd+0x242 nfssvc_nfsd+0x107 sys_nfssvc+0x9c amd64_syscall+0x357 

Then there are these two threads, both servicing WRITE RPCs, both
sleeping deep inside the ZFS code.  Note that one of them is the
"master" krpc thread in this service pool; I don't know if this
accounts for the fact that requests are not getting processed even
though plenty of idle threads exist.  (Note that zfs_write() does not
appear in the stack due to tail-call optimization.)

I don't know the ZFS code well enough to understand what running out
of quota has to do with this situation (you'd think it would just
return immediately with [EDQUOT]) but perhaps it matters that the
clients are not well-behaved and that the filesystem is often almost
at quota but not quite there yet.

-GAWollman