Date: Tue, 16 Sep 2008 11:30:22 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-stable@freebsd.org Cc: Tim Chen <gphoto6@gmail.com> Subject: Re: Suddenly frozen fcntl/stat call on NFS over TCP with MTU 9000 Message-ID: <200809161130.22736.jhb@freebsd.org> In-Reply-To: <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com> References: <1f51039c0809150857l50b6be8eu848e21189a4175d6@mail.gmail.com> <200809151606.23933.jhb@freebsd.org> <1f51039c0809152302s2e6c1471n89588b058069f73d@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 16 September 2008 02:02:14 am Tim Chen wrote:
> On Tue, Sep 16, 2008 at 4:06 AM, John Baldwin <jhb@freebsd.org> wrote:
>
> > On Monday 15 September 2008 11:57:02 am Tim Chen wrote:
> > > Currently I was running a mail server using a netapp filer as backend
> > > storage.
> > > >From time to time, the whole system get stuck and lasted for 3-5
> > minutes.
> > > But
> > > after that, everything recovers normally. During the "stuck" moment,
> > using
> > > ps
> > > auxw shows 200-300 of mail delivery agent(MDA) processes staying in "D"
> > > status.
> > > The command df certainly does not reponse either.
> >
> > Can you use 'ps axl' to determine the wait mesg ("wchan") of the stuck
> > threads
> > when they hang? If it is "lockf", then make sure you have an up-to-date
> > RELENG_6 kernel as there was a recent fix for a "lockf" hang.
> >
>
> Thanks for your suggestion. After trying to 'ps axl', it seems all the "D
> status" process were in nfs,nfsreq,nfsreq. Can you give some hint how to
> keep delving the problem?
>
> My system is RELENG_7 within one week, I always make world to keep my system
> up to date.
>
>
> >
> > Alternatively, if things are stuck in "nfsreq", it may be useful to use
> > tcpdump to look at the NFS requests your client is making. nfsstat can
> > also
> > be useful as you can see which counters are increasing during a hang.
> >
> > When system was stuck, counters of nfsstat grows slowly. It seems only
> read, write, create, remove in RPC counts were increased.
>
> As to tcpdump, since I am not familiar with that, I will try to read some
> doc and make some tests.
>
> Thanks very much for your kindly help. Hope the problem can be solved soon.
Also, do the nfsstats thing I suggested. During a hang, you can do something
like 'nfsstat > one ; sleep 1 ; nfsstat > two' and compare the 'one'
and 'two' files to see which counters (if any) are being bumped during the
hang.
--
John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200809161130.22736.jhb>
