Date: Mon, 22 May 2006 17:43:32 -0400 From: "Rong-en Fan" <grafan@gmail.com> To: "Howard Leadmon" <howard@leadmon.net>, "Kris Kennaway" <kris@obsecurity.org> Cc: freebsd-stable@freebsd.org Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? Message-ID: <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> In-Reply-To: <20060515024958.GA99002@xor.obsecurity.org> References: <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/14/06, Kris Kennaway <kris@obsecurity.org> wrote: > On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > > > > Hello All, > > > > I have been running FBSD a long while, and actually running since the = 5.x > > releases on the server I am having troubles with. I basically have a = small > > network and just use NIS/NFS to link my various FBSD and Solaris machin= es > > together. > > > > This has all been running fine up till a few days ago, when all of a s= udden > > NFS came to a crawl, and CPU usage so high the box appears to freeze al= most. > > When I had 6.1-RC running all seemed well, then came the announcement f= or the > > official 6.1 release, so I did the cvs updates, made world, kernel, and= ran > > mergemaster to get everything up to the 6.1 stable version. > > > > Now after doing this, something is wrong with NFS. It works, it will= return > > information and open files, just it's very very slow, and while perform= ing a > > request the CPU spike is astounding. A simple du of my home directory = can > > take minutes, and machine all but locks up if the request is done over = NFS. > > Here is top snip: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMM= AND > > 497 root 1 4 0 1252K 780K - 2 50:42 188.48% nfs= d > > > > > > This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM = on a > > disk array, and locally is screams, heck NFS used to scream till I upda= ted. I > > am not really sure what info would be useful in debugging, so won't pos= t tons > > of misc junk in this eMail, but if anyone has any ideas as to how best = to > > figure out and resolve this issue it would sure be appreicated... > > Use tcpdump and related tools to find out what traffic is being sent. > > Also verify that you did not change your system configuration in any > way: there have been no changes to NFS since the release, so it is > unclear why an update would cause the problem to suddenly occur. > > Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread "6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu" on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. Let me refresh what problems I'm seeing 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on a nfs directory 2. on server-side, nfsd starts to eats lots of CPU 3. the du finishes 4. on server-side, nfsd still eats lots of CPU, but there is no nfs traffic. Wait for 5 minutes, you can still see that nfsd is "running" and eats lots of CPU. On FreeBSD 6.1R client, it uses UDP mount and fstab is like "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D8192= ". The server's kernel conf is at http://www.rafan.org/FreeBSD/nfs/KERNEL Some related configuration files: /etc/export /export/dir1 host1 host2... /export/dir2 host1 host2... /etc/rc.conf nfs_server_enable=3D"YES" nfs_server_flags=3D"-u -t -n 16" mountd_enable=3D"YES" mountd_flags=3D"-r -l -n" rpc_lockd_enable=3D"YES" rpc_statd_enable=3D"YES" rpcbind_enable=3D"YES" /etc/fstab: /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 The NFS server is also using amd to mount some backup directories from another NFS server. the amd.conf is [global] browsable_dirs =3D yes map_type =3D file mount_type =3D nfs auto_dir =3D /nfs fully_qualified_hosts =3D no log_file =3D syslog nfs_proto =3D udp nfs_allow_insecure_port =3D no nfs_vers =3D 3 # plock =3D yes selectors_on_default =3D yes restart_mounts =3D yes [/backup] map_options =3D type:=3Ddirect map_name =3D /etc/amd.direct /etc/amd.direct: /defaults opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D8192,ws= ize=3D8192 backup type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host} If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. Regards, Rong-En Fan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6eb82e0605221443m5cc3c93bwaf9126ff2fb59667>