From owner-freebsd-fs@freebsd.org  Thu Jul  9 20:12:14 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7F319978AD
 for <freebsd-fs@mailman.ysv.freebsd.org>; Thu,  9 Jul 2015 20:12:14 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 6283018C8;
 Thu,  9 Jul 2015 20:12:13 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2DyBAAf1Z5V/61jaINbDoNYYAaDGrgQgWcKhS1KAoIaEwEBAQEBAQGBCoQjAQEBAwEBAQEgKyALBQsCAQgYAgINGQICJwEJJgIECAcEARwEiAUIDbkBljcBAQEHAQEBAR6BIYoqhDQBAQIDFzQHgi07EoExBZQthGeESIRTlmMCJmOCWloiMQd+CBcjgQQBAQE
X-IronPort-AV: E=Sophos;i="5.15,441,1432612800"; d="scan'208";a="222736514"
Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca)
 ([131.104.99.173])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 09 Jul 2015 16:12:12 -0400
Received: from localhost (localhost [127.0.0.1])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 0E3DD15F564;
 Thu,  9 Jul 2015 16:12:12 -0400 (EDT)
Received: from zcs1.mail.uoguelph.ca ([127.0.0.1])
 by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 4ywuhBGDCSW7; Thu,  9 Jul 2015 16:12:11 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 62A1515F565;
 Thu,  9 Jul 2015 16:12:11 -0400 (EDT)
X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca
Received: from zcs1.mail.uoguelph.ca ([127.0.0.1])
 by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id ji70xgUtr_h7; Thu,  9 Jul 2015 16:12:11 -0400 (EDT)
Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18])
 by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 46BD915F564;
 Thu,  9 Jul 2015 16:12:11 -0400 (EDT)
Date: Thu, 9 Jul 2015 16:12:11 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@csail.mit.edu>
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org
Message-ID: <689709398.6876771.1436472731160.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <21918.48686.157217.979707@khavrinen.csail.mit.edu>
References: <21918.48686.157217.979707@khavrinen.csail.mit.edu>
Subject: Re: How does NFS respond when a VFS operation gives ERESTART?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.95.10]
X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191)
Thread-Topic: How does NFS respond when a VFS operation gives ERESTART?
Thread-Index: 11bCQDl03U1gWGMn/4kqmO60HV0mvw==
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2015 20:12:15 -0000

Garrett Wollman wrote:
> When networked filesystems are not involved, the special error code
> [ERESTART] can be returned by the implementation of any system call,
> with the effect of causing the system call to be restarted when
> execution hits the kernel-user boundary, rather than returning to
> userland.  This is used to allow certain system calls to be restarted
> after being interrupted by a signal.  However, this normally only
> applies to system calls which might potentially sleep for a long time
> -- such as write() to a socket or a tty -- and not to disk I/O, which
> is normally uninterruptible.
> 
> In investigating an issue reported by our users, it appears to me from
> an inspection of the code that ZFS can sometimes give an [ERESTART]
> condition, specifically when writing to a dataset that has reached its
> quota, AND there are pending block free operations that would reduce
> usage below the quota.  But I don't see any code in the NFS (or kernel
> RPC) implementation that would actually handle this case, and of
> course the NFS server doesn't normally hit the user-kernel boundary at
> all.  So does anyone have a theory about what actually happens in this
> case, and what *should* happen?  It doesn't seem useful to just spin
> on the one operation over and over again until the blocks are freed
> (which I think might take a full ZFS transaction sync interval).
> 
Well, I'll admit I'm not sure I really understand the situation, but...
My best guess would be have the NFS server reply NFSERR_DELAY to the client.
(NFSERR_DELAY doesn't exist for NFSv2, but I suspect you don't care about NFSv2?)

NFSERR_DELAY - Tells the client to wait a while (the RFCs don't define how long)
               and then try the RPC again.
Does this sound like it would work?

If it sounds reasonable, I think patching the server to do this shouldn't be
too hard.

rick

> The actual symptom which I'm investigating is that sometimes --
> despite my fixes to the throttling code -- the server is still getting
> throttled, with thousands of requests enqueued for the same file.
> (The FHA code does a nice job of directing them all to the appropriate
> set of service threads, but that doesn't help the other clients get
> anything done because of the global throttle.)  These seem not to make
> any progress for a long time, but the condition ultimately clears by
> itself -- what I'm trying to figure out is why so many requests get
> queued and don't make progress, and so far this seems to be related to
> hitting the quota on the filesystem.  So [ERESTART] may be a total red
> herring, but it was something that stuck out at me when I was
> reviewing the code paths that could set [EDQUOT].
> 
> -GAWollman
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>