Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Feb 2013 22:30:42 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Marc Fournier <scrappy@hub.org>, Kostik Belousov <kib@freebsd.org>, freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: 9-STABLE -> NFS -> NetAPP:
Message-ID:  <20130213203042.GW2522@kib.kiev.ua>
In-Reply-To: <339364797.2960794.1360720239431.JavaMail.root@erie.cs.uoguelph.ca>
References:  <61DAA500-EB20-4861-AA7F-402FF1047B81@hub.org> <339364797.2960794.1360720239431.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

--nk8TfLPx+OTa8gAE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 12, 2013 at 08:50:39PM -0500, Rick Macklem wrote:
> Marc Fournier wrote:
> > Just reset server, so any further details will have to be 'next time'
> > ??? but, just did a csup and am rebuilding ??? the following three files
> > were modified since last build:
> >=20
> > grep nfs /tmp/output
> > Edit src/sys/fs/nfs/nfs_commonsubs.c
> > Edit src/sys/fs/nfsclient/nfs_clrpcops.c
> > Edit src/sys/fs/nfsserver/nfs_nfsdserv.c
> >=20
> >=20
> > On 2013-02-10, at 4:56 PM, Marc Fournier <scrappy@hub.org> wrote:
> >=20
> > >
> > > On 2013-02-10, at 4:31 PM, Rick Macklem <rmacklem@uoguelph.ca>
> > > wrote:
> > >
> > >> Marc Fournier wrote:
> > >>> Hi John ???
> > >>>
> > >>> Does this help?
> > >>>
> > >>> root@io:~ # ps auxl | grep du
> > >>> root 1054 0.0 0.1 16176 6600 ?? D 3:15AM 0:05.38 du -skx /vm/2799
> > >>> 0
> > >>> 81426 0 20 0 newnfs
> > >>> root 12353 0.0 0.1 16176 5104 ?? D Sat03AM 0:05.41 du -skx
> > >>> /vm/2799 0
> > >>> 91597 0 20 0 newnfs
> > >>> root 64529 0.0 0.1 16176 5164 ?? D Fri03AM 0:05.40 du -skx
> > >>> /vm/2799 0
> > >>> 43227 0 20 0 newnfs
> > >>> root 12855 0.0 0.0 16308 1988 0 S+ 5:26AM 0:00.00 grep du 0 12847
> > >>> 0 20
> > >>> 0 piperd
> > >> It is probably too late, but all the lines (without the | grep du)
> > >> would be
> > >> more useful. I also include the "H" flag, so it lists threads as
> > >> well as
> > >> processes. The above just says the "du" command is waiting for a
> > >> vnode lock.
> > >> The interesting process/thread is the one that is holding a vnode
> > >> lock
> > >> while waiting for something else.
> > >
> > > As requested, 'ps auxlH' attached ???
> > >
> > >
> > > <ps.out.bz2>
> > >
> Well, I took a look at the ps output and I didn't see anything that would
> identify what the hang is. There are a lot of processes sleeping on "newn=
fs"
> (waiting for a vnode lock) and many sleeping on "vofflock" (waiting for t=
he
>  f_offset lock).
I never got any attachments on the thread.

See
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernel=
debug-deadlocks.html
for the description of what is needed to start debugging.
>=20
> Unfortunately, I can't spot any process/thread that is blocked on somethi=
ng
> else, where it would seem likely to be holding either an nfs vnode lock or
> f_offset lock that isn't one of these.
>=20
> There were changes about 5 months ago which it appears fixed a deadlock r=
ace
> between vnode locks and offset locks for paging (r236321 and friends).
No, I do not think that the description of the changes is right.

>=20
> I am wondering if there could be other similar races, possibly specific to
> paging in over NFS? (I can't see any case where there is a LOR, so I can't
> think of what it might be?)
>=20
> If you just want the hangs to go away, I'd suggest moving the executable
> is /usr/local/sbin (httpd maybe) to a local file system on the server,
> since it does seem to be related to paging this executable in over NFS.
>=20
> rick
> ps: I've added kib@ to the cc, in case he is aware of other related races?
>=20
> > >>
> > >> Are you still getting the:
> > >> nfs_getpages: error 13
> > >> vm_fault: pager read error, pid 11355 (https)
> > >
> > > Fairly quiet:
> > >
> > > <Screen Shot 2013-02-10 at 4.43.55 PM.png>
> > >
> > > And that is it since last reboot ~20 days ago ???
> > >
> > >>
> > >> messages logged?
> > >>
> > >> With John's recent patch, the error# would no longer be 13 if it
> > >> was
> > >> caused by the "intr" flag resulting in a Read RPC terminating with
> > >> EINTR.
> > >> If you are still getting the above with "error 13", it suggests
> > >> that
> > >> the server is replying EACCES for the Read RPC.
> > >> I suggested before that you check to make sure that the executable
> > >> had
> > >> read access for everyone one the file server. Since I didn't hear
> > >> back,
> > >> I'll assume this is the case.
> > >
> > > Don't understand this question ??? I have 34 VPSs running off of this
> > > server right now ??? that 'du process' runs against each of those VPSs
> > > every night, and this problem started happening on Friday night's
> > > run ??? ~18 days into uptime ??? so the same process has run repeated=
ly,
> > > with no issues, 18 times before it hung on Friday ??? also, the hang,
> > > once 'triggered', only seems to recur against the same directory ???
> > > the same directory doesn't necessarily trigger it, but once it
> > > starts, it appears to do it for the same directory ??? I'm not sure if
> > > I've ever seem it happening to two different directories at the same
> > > time ???
> > >
> > > Also, please note that the du command is run from the physical
> > > server, as root ???
> > >
> > >> rick
> > >> ps: If it is still up and hasn't been rebooted, you could:
> > >>   sysctl debug.kdb.break_to_debugger=3D1
> > >>   - then type <ctrl><alt><esc> at the console and do the following
> > >>     from the debugger
> > >>   http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbo=
ok/kerneldebug-deadlocks.html
> > >>   How well this work depends on what options your kernel was built
> > >>   with.
> > >
> > > My remote console on that one doesn't work very well ??? I can view,
> > > but I can't type ???
> > >
> > >
> >=20
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
> > "freebsd-stable-unsubscribe@freebsd.org"

--nk8TfLPx+OTa8gAE
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIbBAEBAgAGBQJRG/fxAAoJEJDCuSvBvK1B2UEP900T6fi2b5piJ0L56tvYkmVk
xc8UeWip1JFCoEoHwe1BNh26QlDJ3QXHwKSzBtlg14+U+v2Q3wI5d90w8/7mOZj9
JfICggaXhzR6by/Ob8v109kKyfK3vFQOB3k7leyX2ZscXnO9D+1o+cfuWezmAtF4
pDxQrajkBIRb3cAVsB8JxwMJCQo0wiSlziE8QZh1qU6k5PzrjmxEgOtFZxLKBCSj
lFyFdPb2QCUV4FdoNOYPYlpX9cHOXlmwOwuZFkBSUqrTdyI8USoQ4q/XeAbc0k9N
50w734CIR5wHQAvnsnfw2vVPnB1KAB5RknSxmzsGMmmZQIbOItu1X7JrVxD9kh1C
KqK25mfJm3D16/qLvkv4pq0Iwpfypb8jutDltKbNdngaTFLyeuF3IY7w9opxFtCd
FjrvtRL2kOJEDP2mAHoTPL9t/gJrLX57EqpJIxZvYawqiORtal3YCi6JcgaE9ElU
3ZbB157KuhJvxf6YPfConIBdWKtMKUgclyuYBELqgFDIVnl4SnU+3wpJD0ADea7/
0GIwWKHjpd+PNdRbOaeeLg7leJIdzqxET8Sk+hO+L4eNx/1mZjIxaDnYpyOosJ4B
E+rr02qXsvO3ztZJ6QUWMJSMFpArbZODEunMBcl/On776/hPIyWPbSkmrHlymB3X
5lEGVBr4m7vHtJabc6E=
=bBcd
-----END PGP SIGNATURE-----

--nk8TfLPx+OTa8gAE--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130213203042.GW2522>