Date: Tue, 23 May 2006 08:56:54 -0400 From: "Rong-en Fan" <grafan@gmail.com> To: "Konstantin Belousov" <kostikbel@gmail.com> Cc: freebsd-stable@freebsd.org, Howard Leadmon <howard@leadmon.net>, Kris Kennaway <kris@obsecurity.org> Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? Message-ID: <6eb82e0605230556n31b86e55y1b07a2ef6ad9ca14@mail.gmail.com> In-Reply-To: <20060523081041.GL54541@deviant.kiev.zoral.com.ua> References: <017301c67784$45377a90$071872cf@Leadmon.local> <20060515024958.GA99002@xor.obsecurity.org> <6eb82e0605221443m5cc3c93bwaf9126ff2fb59667@mail.gmail.com> <20060523081041.GL54541@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/23/06, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: > > On 5/14/06, Kris Kennaway <kris@obsecurity.org> wrote: > > >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > > >> > > >> Hello All, > > >> > > >> I have been running FBSD a long while, and actually running since t= he > > >5.x > > >> releases on the server I am having troubles with. I basically have= a > > >small > > >> network and just use NIS/NFS to link my various FBSD and Solaris mac= hines > > >> together. > > >> > > >> This has all been running fine up till a few days ago, when all of = a > > >sudden > > >> NFS came to a crawl, and CPU usage so high the box appears to freeze > > >almost. > > >> When I had 6.1-RC running all seemed well, then came the announcemen= t > > >for the > > >> official 6.1 release, so I did the cvs updates, made world, kernel, = and > > >ran > > >> mergemaster to get everything up to the 6.1 stable version. > > >> > > >> Now after doing this, something is wrong with NFS. It works, it w= ill > > >return > > >> information and open files, just it's very very slow, and while > > >performing a > > >> request the CPU spike is astounding. A simple du of my home directo= ry > > >can > > >> take minutes, and machine all but locks up if the request is done ov= er > > >NFS. > > >> Here is top snip: > > >> > > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > > >COMMAND > > >> 497 root 1 4 0 1252K 780K - 2 50:42 188.48% = nfsd > > >> > > >> > > >> This is a nice IBM eServer with dual P4-XEON's and a couple GB or R= AM > > >on a > > >> disk array, and locally is screams, heck NFS used to scream till I > > >updated. I > > >> am not really sure what info would be useful in debugging, so won't = post > > >tons > > >> of misc junk in this eMail, but if anyone has any ideas as to how be= st to > > >> figure out and resolve this issue it would sure be appreicated... > > > > > >Use tcpdump and related tools to find out what traffic is being sent. > > > > > >Also verify that you did not change your system configuration in any > > >way: there have been no changes to NFS since the release, so it is > > >unclear why an update would cause the problem to suddenly occur. > > > > > >Kris > > > > Hi Kris and Howard, > > > > As I posted few days ago, I have similar problems like Howard's > > (some details in the thread "6.1-RELEASE, em0 high interrupt rate > > and nfsd eats lots of cpu" on stable@). After binary searching > > the source tree, I found that > > > > RELENG_6_1, 2006.04.30.03.57 ok > > RELENG_6_1, 2006.04.30.04.00 bad > > > > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. > > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, > > the same problem occurs. > > > > Let me refresh what problems I'm seeing > > > > 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on > > a nfs directory > > 2. on server-side, nfsd starts to eats lots of CPU > > 3. the du finishes > > 4. on server-side, nfsd still eats lots of CPU, but there is no > > nfs traffic. Wait for 5 minutes, you can still see that nfsd is > > "running" and eats lots of CPU. > > > > On FreeBSD 6.1R client, it uses UDP mount and fstab is like > > "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and > > fstab is like "defaults,udp,hard,intr,nfsvers=3D3,rsize=3D8192,wsize=3D= 8192". > > The server's kernel conf is at > > > > http://www.rafan.org/FreeBSD/nfs/KERNEL > > > > Some related configuration files: > > > > /etc/export > > /export/dir1 host1 host2... > > /export/dir2 host1 host2... > > > > /etc/rc.conf > > nfs_server_enable=3D"YES" > > nfs_server_flags=3D"-u -t -n 16" > > mountd_enable=3D"YES" > > mountd_flags=3D"-r -l -n" > > rpc_lockd_enable=3D"YES" > > rpc_statd_enable=3D"YES" > > rpcbind_enable=3D"YES" > > > > /etc/fstab: > > /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 > > /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 > > > > The NFS server is also using amd to mount some backup directories > > from another NFS server. the amd.conf is > > > > [global] > > browsable_dirs =3D yes > > map_type =3D file > > mount_type =3D nfs > > auto_dir =3D /nfs > > fully_qualified_hosts =3D no > > log_file =3D syslog > > nfs_proto =3D udp > > nfs_allow_insecure_port =3D no > > nfs_vers =3D 3 > > # plock =3D yes > > selectors_on_default =3D yes > > restart_mounts =3D yes > > > > [/backup] > > map_options =3D type:=3Ddirect > > map_name =3D /etc/amd.direct > > > > /etc/amd.direct: > > /defaults > > opts:=3Drw,grpid,resvport,vers=3D3,proto=3Dudp,nosuid,nodev,rsize=3D819= 2,wsize=3D8192 > > backup type:=3Dnfs;rhost:=3Dnfs2;rfs:=3D/nfs2/${host} > > > > > > If there are any thing I can provide to help tracking this down. Please > > let me know. By the way, I tried with truss/kdump to see what happens > > when nfsd eats lot of CPUs, but in vain. They do not return anything. > > > I tried your recipe on 7-CURRENT with locally exported fs, remounted > over nfs. I did not get the behaviour your described. As noted in my previous thread, I have another 6.1-RELEASE nfs server, which does not have this problem. > Could you, please, provide the backtrace for the nfsd that > eats the CPU (from the ddb). I think it would be helpful to get several > backtraces (i.e., bt <nfsd pid>, cont, bt <nfsd pid> ...) to > see where it running. I'm afraid that I can not do that. Last time I tried breaking into ddb (on = 5.x), it hangs my serial console and the server is miles away :-( . Perhaps we can ask Howard to do that? > Also, just in case, does filesystem that is exported and shows problem, > have quotas enabled ? One line of your fstab has userquotas, other does n= ot. No. Regards, Rong-En Fan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6eb82e0605230556n31b86e55y1b07a2ef6ad9ca14>