From owner-freebsd-fs@freebsd.org Thu Jul 9 18:32:17 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4DE7999730B for ; Thu, 9 Jul 2015 18:32:17 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E40221E7E; Thu, 9 Jul 2015 18:32:16 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id t69IWEkZ007902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Thu, 9 Jul 2015 14:32:14 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id t69IWEIX007899; Thu, 9 Jul 2015 14:32:14 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21918.48686.157217.979707@khavrinen.csail.mit.edu> Date: Thu, 9 Jul 2015 14:32:14 -0400 From: Garrett Wollman To: freebsd-fs@freebsd.org Cc: rmacklem@freebsd.org Subject: How does NFS respond when a VFS operation gives ERESTART? X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 09 Jul 2015 14:32:14 -0400 (EDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 18:32:17 -0000 When networked filesystems are not involved, the special error code [ERESTART] can be returned by the implementation of any system call, with the effect of causing the system call to be restarted when execution hits the kernel-user boundary, rather than returning to userland. This is used to allow certain system calls to be restarted after being interrupted by a signal. However, this normally only applies to system calls which might potentially sleep for a long time -- such as write() to a socket or a tty -- and not to disk I/O, which is normally uninterruptible. In investigating an issue reported by our users, it appears to me from an inspection of the code that ZFS can sometimes give an [ERESTART] condition, specifically when writing to a dataset that has reached its quota, AND there are pending block free operations that would reduce usage below the quota. But I don't see any code in the NFS (or kernel RPC) implementation that would actually handle this case, and of course the NFS server doesn't normally hit the user-kernel boundary at all. So does anyone have a theory about what actually happens in this case, and what *should* happen? It doesn't seem useful to just spin on the one operation over and over again until the blocks are freed (which I think might take a full ZFS transaction sync interval). The actual symptom which I'm investigating is that sometimes -- despite my fixes to the throttling code -- the server is still getting throttled, with thousands of requests enqueued for the same file. (The FHA code does a nice job of directing them all to the appropriate set of service threads, but that doesn't help the other clients get anything done because of the global throttle.) These seem not to make any progress for a long time, but the condition ultimately clears by itself -- what I'm trying to figure out is why so many requests get queued and don't make progress, and so far this seems to be related to hitting the quota on the filesystem. So [ERESTART] may be a total red herring, but it was something that stuck out at me when I was reviewing the code paths that could set [EDQUOT]. -GAWollman