Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Dec 1999 17:54:52 +0000 (GMT)
From:      iedowse@maths.tcd.ie
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/15222: mbuf leak in nfs_srvcache.c
Message-ID:  <199912021754.aa92503@walton.maths.tcd.ie>

next in thread | raw e-mail | index | archive | help

>Number:         15222
>Category:       kern
>Synopsis:       mbuf leak in nfs_srvcache.c
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Dec  2 10:00:01 PST 1999
>Closed-Date:
>Last-Modified:
>Originator:     Ian Dowse
>Release:        FreeBSD 3.3-STABLE i386
>Organization:
		School of Mathematics
		Trinity College Dublin
>Environment:

	All versions of FreeBSD, when used as an NFS server. This problem
	will only have a significant impact on very busy NFS servers.

>Description:

	FreeBSD maintains a cache of some recent NFS server requests and
	their replies. This allows duplicate requests to be discarded while
	the requested operation is still in progress, and avoids the need
	to redo an operation if the reply gets lost. The default maximum
	cache size is 64 entries.

	When a request is received, nfsrv_getcache() either finds an existing
	cache entry, or creates a new one. After the operation completes,
	nfsrv_updatecache() updates the cache entry, sometimes including
	a copy of the reply mbuf chain.

	The problem is that nfsrv_updatecache() assumes it will called only
	once for a given cache entry, so it does not check if the cache entry
	already has an associated mbuf chain. Normally this assumption is
	correct, but under certain high-load situations, the following can
	occur:
		1)	A request R is received by the server, say a write
			with the 'filesync' option.
		2)	64 other requests are received before R completes.
			The cache entry for R gets flushed out.
		3)	A retransmit of R is received (R1); since the cache
			entry is gone, a new cache entry is created, and
			execution of R1 begins.
		4)	R completes, and a copy of the reply mbuf chain gets
			saved by nfsrv_updatecache().
		5)	R1 completes. nfsrv_updatecache() replaces the cache
			entry with a copy of R1's reply, but doesn't free
			the original mbuf chain.

	This problem was leaking 200-500 mbufs each day on our busiest
	NFS server.

	There is another leak in nfsrv_cleancache(), but I think this will
	only be triggered if all the nfsd's get killed. The patch below
	fixes both of these problems.

	As commented in the patch, this problem only occurs when the cache
	is too small, so it might be desirable to increase the cache size
	(double it maybe) when nfsrv_updatecache() finds a RC_DONE entry.
	Recording the event in the nfs stats structure and having a sysctl
	to increase the cache size would be another option. I can submit
	patches for either.

>How-To-Repeat:
	
	1)	Keep the disk on the NFS server busy - for example run
			find /whatever -type f |xargs cat > /dev/null
	2)	Send an NFS filesync write request for a file 'file1'
	3)	Send 64 access requests for another file 'file2' each
		with a different xid, to flush the cache.
	4)	Retransmit the NFS filesync write (with the same xid).
	5)	Wait for the two replies to the writes.

	This should leak just a few mbufs, so it needs to be repeated a
	number of times before the leak becomes apparent.

>Fix:
	

--- nfs_srvcache.c.orig	Thu Dec  2 16:22:37 1999
+++ nfs_srvcache.c	Thu Dec  2 16:49:44 1999
@@ -293,6 +293,19 @@
 				goto loop;
 			}
 			rp->rc_flag |= RC_LOCKED;
+			if (rp->rc_state == RC_DONE) {
+				/*
+				 * This can happen if the cache is too small;
+				 * retransmits of the same request aren't
+				 * dropped so we see the operation completing
+				 * more than once.
+				 * XXX maybe increase desirednfsrvcache here.
+				 */
+				if (rp->rc_flag & RC_REPMBUF) {
+					m_freem(rp->rc_reply);
+					rp->rc_flag &= ~RC_REPMBUF;
+				}
+			}
 			rp->rc_state = RC_DONE;
 			/*
 			 * If we have a valid reply update status and save
@@ -332,6 +345,10 @@
 		nextrp = rp->rc_lru.tqe_next;
 		LIST_REMOVE(rp, rc_hash);
 		TAILQ_REMOVE(&nfsrvlruhead, rp, rc_lru);
+		if (rp->rc_flag & RC_REPMBUF)
+			m_freem(rp->rc_reply);
+		if (rp->rc_flag & RC_NAM)
+			free(rp->rc_nam, M_SONAME);
 		free(rp, M_NFSD);
 	}
 	numnfsrvcache = 0;

>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912021754.aa92503>