Date: Tue, 16 Sep 2008 11:30:22 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-stable@freebsd.org Cc: Tim Chen <gphoto6@gmail.com> Subject: Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000 Message-ID: <200809161130.22736.jhb@freebsd.org> In-Reply-To: <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com> References: <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> <200809151606.23933.jhb@freebsd.org> <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote: > On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <jhb@freebsd.org> wrote: > > > On Monday 15 September 2008 11:57:02 am Tim Chen wrote: > > > Currently I was running a mail server using a netapp filer as backend > > > storage. > > > >From time to time, the whole system get stuck and lasted for 3-5 > > minutes. > > > But > > > after that, everything recovers normally. During the "stuck" moment, > > using > > > ps > > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D" > > > status. > > > The command df certainly does not reponse either. > > > > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck > > threads > > when they hang? If it is "lockf", then make sure you have an up-to-date > > RELENG_6 kernel as there was a recent fix for a "lockf" hang. > > > > Thanks for your suggestion. After trying to 'ps axl', it seems all the "D > status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to > keep delving the problem? > > My system is RELENG_7 within one week, I always make world to keep my system > up to date. > > > > > > Alternatively, if things are stuck in "nfsreq", it may be useful to use > > tcpdump to look at the NFS requests your client is making. nfsstat can > > also > > be useful as you can see which counters are increasing during a hang. > > > > When system was stuck, counters of nfsstat grows slowly. It seems only > read, write, create, remove in RPC counts were increased. > > As to tcpdump, since I am not familiar with that, I will try to read some > doc and make some tests. > > Thanks very much for your kindly help. Hope the problem can be solved soon. Also, do the nfsstats thing I suggested. During a hang, you can do something like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one' and 'two' files to see which counters (if any) are being bumped during the hang. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200809161130.22736.jhb>