From owner-svn-src-stable-10@freebsd.org  Fri Oct 30 19:26:56 2015
Return-Path: <owner-svn-src-stable-10@freebsd.org>
Delivered-To: svn-src-stable-10@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4462A1FA47;
 Fri, 30 Oct 2015 19:26:56 +0000 (UTC)
 (envelope-from wollman@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A26D21A3F;
 Fri, 30 Oct 2015 19:26:56 +0000 (UTC)
 (envelope-from wollman@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id t9UJQtN3034628;
 Fri, 30 Oct 2015 19:26:55 GMT (envelope-from wollman@FreeBSD.org)
Received: (from wollman@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id t9UJQtwP034626;
 Fri, 30 Oct 2015 19:26:55 GMT (envelope-from wollman@FreeBSD.org)
Message-Id: <201510301926.t9UJQtwP034626@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: wollman set sender to
 wollman@FreeBSD.org using -f
From: Garrett Wollman <wollman@FreeBSD.org>
Date: Fri, 30 Oct 2015 19:26:55 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org
Subject: svn commit: r290203 - stable/10/sys/rpc
X-SVN-Group: stable-10
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-stable-10@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SVN commit messages for only the 10-stable src tree
 <svn-src-stable-10.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-stable-10>, 
 <mailto:svn-src-stable-10-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable-10/>
List-Post: <mailto:svn-src-stable-10@freebsd.org>
List-Help: <mailto:svn-src-stable-10-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-stable-10>, 
 <mailto:svn-src-stable-10-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2015 19:26:57 -0000

Author: wollman
Date: Fri Oct 30 19:26:55 2015
New Revision: 290203
URL: https://svnweb.freebsd.org/changeset/base/290203

Log:
  Long-overdue MFC of r280930:
  
    Fix overflow bugs in and remove obsolete limit from kernel RPC
    implementation.
  
    The kernel RPC code, which is responsible for the low-level scheduling
    of incoming NFS requests, contains a throttling mechanism that
    prevents too much kernel memory from being tied up by NFS requests
    that are being serviced.  When the throttle is engaged, the RPC layer
    stops servicing incoming NFS sockets, resulting ultimately in
    backpressure on the clients (if they're using TCP).  However, this is
    a very heavy-handed mechanism as it prevents all clients from making
    any requests, regardless of how heavy or light they are.  (Thus, when
    engaged, the throttle often prevents clients from even mounting the
    filesystem.)  The throttle mechanism applies specifically to requests
    that have been received by the RPC layer (from a TCP or UDP socket)
    and are queued waiting to be serviced by one of the nfsd threads; it
    does not limit the amount of backlog in the socket buffers.
  
    The original implementation limited the total bytes of queued requests
    to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB.
    The former limit seems reasonable, since requests queued in the socket
    buffers and replies being constructed to the requests in progress will
    all require some amount of network memory, but the 45 MiB limit is
    plainly ridiculous for modern memory sizes: when running 256 service
    threads on a busy server, 45 MiB would result in just a single
    maximum-sized NFS3PROC_WRITE queued per thread before throttling.
  
    Removing this limit exposed integer-overflow bugs in the original
    computation, and related bugs in the routines that actually account
    for the amount of traffic enqueued for service threads.  The old
    implementation also attempted to reduce accounting overhead by
    batching updates until each queue is fully drained, but this is prone
    to livelock, resulting in repeated accumulate-throttle-drain cycles on
    a busy server.  Various data types are changed to long or unsigned
    long; explicit 64-bit types are not used due to the unavailability of
    64-bit atomics on many 32-bit platforms, but those platforms also
    cannot support nmbclusters large enough to cause overflow.
  
    This code (in a 10.1 kernel) is presently running on production NFS
    servers at CSAIL.
  
    Summary of this revision:
    * Removes 45 MiB limit on requests queued for nfsd service threads
    * Fixes integer-overflow and signedness bugs
    * Avoids unnecessary throttling by not deferring accounting for
      completed requests
  
  Differential Revision:	https://reviews.freebsd.org/D2165
  Reviewed by:	rmacklem, mav
  Relnotes:	yes
  Sponsored by:	MIT Computer Science & Artificial Intelligence Laboratory

Modified:
  stable/10/sys/rpc/svc.c
  stable/10/sys/rpc/svc.h
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/sys/rpc/svc.c
==============================================================================
--- stable/10/sys/rpc/svc.c	Fri Oct 30 19:20:40 2015	(r290202)
+++ stable/10/sys/rpc/svc.c	Fri Oct 30 19:26:55 2015	(r290203)
@@ -73,7 +73,7 @@ static struct svc_callout *svc_find(SVCP
     char *);
 static void svc_new_thread(SVCGROUP *grp);
 static void xprt_unregister_locked(SVCXPRT *xprt);
-static void svc_change_space_used(SVCPOOL *pool, int delta);
+static void svc_change_space_used(SVCPOOL *pool, long delta);
 static bool_t svc_request_space_available(SVCPOOL *pool);
 
 /* ***************  SVCXPRT related stuff **************** */
@@ -113,13 +113,14 @@ svcpool_create(const char *name, struct 
 	}
 
 	/*
-	 * Don't use more than a quarter of mbuf clusters or more than
-	 * 45Mb buffering requests.
+	 * Don't use more than a quarter of mbuf clusters.  Nota bene:
+	 * nmbclusters is an int, but nmbclusters*MCLBYTES may overflow
+	 * on LP64 architectures, so cast to u_long to avoid undefined
+	 * behavior.  (ILP32 architectures cannot have nmbclusters
+	 * large enough to overflow for other reasons.)
 	 */
-	pool->sp_space_high = nmbclusters * MCLBYTES / 4;
-	if (pool->sp_space_high > 45 << 20)
-		pool->sp_space_high = 45 << 20;
-	pool->sp_space_low = 2 * pool->sp_space_high / 3;
+	pool->sp_space_high = (u_long)nmbclusters * MCLBYTES / 4;
+	pool->sp_space_low = (pool->sp_space_high / 3) * 2;
 
 	sysctl_ctx_init(&pool->sp_sysctl);
 	if (sysctl_base) {
@@ -139,24 +140,24 @@ svcpool_create(const char *name, struct 
 		    "groups", CTLFLAG_RD, &pool->sp_groupcount, 0,
 		    "Number of thread groups");
 
-		SYSCTL_ADD_UINT(&pool->sp_sysctl, sysctl_base, OID_AUTO,
+		SYSCTL_ADD_ULONG(&pool->sp_sysctl, sysctl_base, OID_AUTO,
 		    "request_space_used", CTLFLAG_RD,
-		    &pool->sp_space_used, 0,
+		    &pool->sp_space_used,
 		    "Space in parsed but not handled requests.");
 
-		SYSCTL_ADD_UINT(&pool->sp_sysctl, sysctl_base, OID_AUTO,
+		SYSCTL_ADD_ULONG(&pool->sp_sysctl, sysctl_base, OID_AUTO,
 		    "request_space_used_highest", CTLFLAG_RD,
-		    &pool->sp_space_used_highest, 0,
+		    &pool->sp_space_used_highest,
 		    "Highest space used since reboot.");
 
-		SYSCTL_ADD_UINT(&pool->sp_sysctl, sysctl_base, OID_AUTO,
+		SYSCTL_ADD_ULONG(&pool->sp_sysctl, sysctl_base, OID_AUTO,
 		    "request_space_high", CTLFLAG_RW,
-		    &pool->sp_space_high, 0,
+		    &pool->sp_space_high,
 		    "Maximum space in parsed but not handled requests.");
 
-		SYSCTL_ADD_UINT(&pool->sp_sysctl, sysctl_base, OID_AUTO,
+		SYSCTL_ADD_ULONG(&pool->sp_sysctl, sysctl_base, OID_AUTO,
 		    "request_space_low", CTLFLAG_RW,
-		    &pool->sp_space_low, 0,
+		    &pool->sp_space_low,
 		    "Low water mark for request space.");
 
 		SYSCTL_ADD_INT(&pool->sp_sysctl, sysctl_base, OID_AUTO,
@@ -1064,11 +1065,11 @@ svc_assign_waiting_sockets(SVCPOOL *pool
 }
 
 static void
-svc_change_space_used(SVCPOOL *pool, int delta)
+svc_change_space_used(SVCPOOL *pool, long delta)
 {
-	unsigned int value;
+	unsigned long value;
 
-	value = atomic_fetchadd_int(&pool->sp_space_used, delta) + delta;
+	value = atomic_fetchadd_long(&pool->sp_space_used, delta) + delta;
 	if (delta > 0) {
 		if (value >= pool->sp_space_high && !pool->sp_space_throttled) {
 			pool->sp_space_throttled = TRUE;
@@ -1102,7 +1103,7 @@ svc_run_internal(SVCGROUP *grp, bool_t i
 	enum xprt_stat stat;
 	struct svc_req *rqstp;
 	struct proc *p;
-	size_t sz;
+	long sz;
 	int error;
 
 	st = mem_alloc(sizeof(*st));
@@ -1259,17 +1260,16 @@ svc_run_internal(SVCGROUP *grp, bool_t i
 		/*
 		 * Execute what we have queued.
 		 */
-		sz = 0;
 		mtx_lock(&st->st_lock);
 		while ((rqstp = STAILQ_FIRST(&st->st_reqs)) != NULL) {
 			STAILQ_REMOVE_HEAD(&st->st_reqs, rq_link);
 			mtx_unlock(&st->st_lock);
-			sz += rqstp->rq_size;
+			sz = (long)rqstp->rq_size;
 			svc_executereq(rqstp);
+			svc_change_space_used(pool, -sz);
 			mtx_lock(&st->st_lock);
 		}
 		mtx_unlock(&st->st_lock);
-		svc_change_space_used(pool, -sz);
 		mtx_lock(&grp->sg_lock);
 	}
 

Modified: stable/10/sys/rpc/svc.h
==============================================================================
--- stable/10/sys/rpc/svc.h	Fri Oct 30 19:20:40 2015	(r290202)
+++ stable/10/sys/rpc/svc.h	Fri Oct 30 19:26:55 2015	(r290203)
@@ -371,10 +371,10 @@ typedef struct __rpc_svcpool {
 	 * amount of memory used by RPC requests which are queued
 	 * waiting for execution.
 	 */
-	unsigned int	sp_space_low;
-	unsigned int	sp_space_high;
-	unsigned int	sp_space_used;
-	unsigned int	sp_space_used_highest;
+	unsigned long	sp_space_low;
+	unsigned long	sp_space_high;
+	unsigned long	sp_space_used;
+	unsigned long	sp_space_used_highest;
 	bool_t		sp_space_throttled;
 	int		sp_space_throttle_count;