From owner-freebsd-fs@FreeBSD.ORG  Wed Jul  6 01:15:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 927061065674
	for <freebsd-fs@freebsd.org>; Wed,  6 Jul 2011 01:15:44 +0000 (UTC)
	(envelope-from jwd@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 7FA218FC17;
	Wed,  6 Jul 2011 01:15:44 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p661FicL087918;
	Wed, 6 Jul 2011 01:15:44 GMT (envelope-from jwd@freefall.freebsd.org)
Received: (from jwd@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p661FiGe087917;
	Wed, 6 Jul 2011 01:15:44 GMT (envelope-from jwd)
Date: Wed, 6 Jul 2011 01:15:44 +0000
From: John <jwd@slowblink.com>
To: Rick Macklem <rmacklem@uoguelph.ca>, freebsd-fs@freebsd.org
Message-ID: <20110706011544.GA69706@FreeBSD.org>
References: <20110609133805.GA78874@FreeBSD.org>
	<1069270455.338453.1307636209760.JavaMail.root@erie.cs.uoguelph.ca>
	<20110610125939.GA69616@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110610125939.GA69616@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: Re: New NFS server stress test hang
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2011 01:15:44 -0000

----- John's Original Message -----
> ----- Rick Macklem's Original Message -----
> > John De wrote:
> > > ----- Rick Macklem's Original Message -----
> > > > John De wrote:
> > > > > Hi,
> > > > >
> > > > > We've been running some stress tests of the new nfs server.
> > > > > The system is at r222531 (head), 9 clients, two mounts each
> > > > > to the server:
> > > > >
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2
> > > > > ${servera}:/vol/datsrc /c/$servera/vol/datsrc
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0
> > > > > ${servera}:/vol/datgen /c/$servera/vol/datgen
> > > > >
> > > > >
> > > > > The system is still up & responsive, simply no nfs services
> > > > > are working. All (200) threads appear to be active, but not
> > > > > doing anything. The debugger is not compiled into this kernel.
> > > > > We can run any other tracing commands desired. We can also
> > > > > rebuild the kernel with the debugger enabled for any kernel
> > > > > debugging needed.
> > > > >
> > > > > --- long logs deleted ---
> > > >
> > > > How about a:
> > > >  ps axHlww <-- With the "H" we'll see what the nfsd server threads
> > > >  are up to
> > > >  procstat -kka
> > > >
> > > > Oh, and a couple of nfsstats a few seconds apart. It's what the
> > > > counts
> > > > are changing by that might tell us what is going on. (You can use
> > > > "-z"
> > > > to zero them out, if you have an nfsstat built from recent sources.)
> > > >
> > > > Also, does a new NFS mount attempt against the server do anything?
> > > >
> > > > Thanks in advance for help with this, rick
> > > 
> > > Hi Rick,
> > > 
> > > Here's the output. In general, the nfsd processes appear to be in
> > > either nfsrvd_getcache(35 instances) or nfsrvd_updatecache(164)
> > > sleeping on
> > > "nfssrc". The server numbers don't appear to be moving. A showmount
> > > from a
> > > client system works, but a mount does not (see below).
> >
> > Please try the attached patch and let me know if it helps. When I looked
> > I found several places where the rc_flag variable was being fiddled without the
> > mutex held. I suspect one of these resulted in the RC_LOCKED flag not
> > getting cleared, so all the threads got stuck waiting on it.
> > 
> > The patch is at:
> >   http://people.freebsd.org/~rmacklem/cache.patch
> > in case it gets eaten by the list handler.
> > Thanks for digging into this, rick
> 
> Hi Rick,
> 
>    Patch applied. The system has been up and running for about
> 16 hours now and so far it's still handling the load quite nicely.
> 
> last pid: 15853;  load averages:  5.36,  4.64,  4.48          up 0+16:08:16
> 08:48:07
> 72 processes:  7 running, 65 sleeping
> CPU:     % user,     % nice,     % system,     % interrupt,     % idle
> Mem: 22M Active, 3345M Inact, 79G Wired, 9837M Buf, 11G Free
> Swap: 
> 
>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>  2049 root         26  52    0 10052K  1712K CPU3    3  97:21 942.24% nfsd
> 
>    I'll followup again in 24 hours with another status.
> 
>    Any performance related numbers/knobs we can provide that might
> be of interest?
> 
>    Thanks Rick.
> 
> -John

Hi Rick,

   We've run the nfs share patchs and produced some numbers, no errors
seen. A few questions about times.

Four sets of repeated runs, each set run for about 2 days, with the following nfs options:

tcp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0
tcp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2
udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0
udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2
    
negnametimeo=2 for /dat (src), 0 for /datgen (obj).  Switched out tcp and udp, otherwise left options the same.

stock = stock system
nolock = share patches

Run             Median build time (min)         Max build time (min)
Tcp.nolock:     11.2                            11.6
Tcp.stock:      13.6                            20.8  (Some of these ran over cooling issues, ie: additional heat)
Udp.nolock:     14.9                            15.3
Udp.stock:      20.6                            20.7
    
Average nfsd cpu usage:
    
Tcp.nolock:     197.46
Tcp.stock:      164.872
Udp.nolock:     374.656
Udp.stock:      339.156
    
These cpu numbers seem a bit confusing. Udp seems to have more overhead. The share patches
seem faster walkclock timewise, but use more cpu.

Thoughts?

Thanks,
John