From owner-freebsd-stable@FreeBSD.ORG  Mon Feb 11 01:43:22 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D9CC5840;
 Mon, 11 Feb 2013 01:43:22 +0000 (UTC) (envelope-from scrappy@hub.org)
Received: from hub.org (hub.org [200.46.208.146])
 by mx1.freebsd.org (Postfix) with ESMTP id 7A538E2A;
 Mon, 11 Feb 2013 01:43:22 +0000 (UTC)
Received: from maia.hub.org (unknown [200.46.151.188])
 by hub.org (Postfix) with ESMTP id C581012B9501;
 Sun, 10 Feb 2013 21:43:21 -0400 (AST)
Received: from hub.org ([200.46.208.146])
 by maia.hub.org (mx1.hub.org [200.46.151.188]) (amavisd-maia, port 10024)
 with ESMTP id 28996-09; Mon, 11 Feb 2013 01:43:21 +0000 (UTC)
Received: from [192.168.0.52] (S01067cb21b2ff4ca.gv.shawcable.net
 [24.108.26.71]) by hub.org (Postfix) with ESMTPA id 4A26812B94FD;
 Sun, 10 Feb 2013 21:43:20 -0400 (AST)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: 9-STABLE -> NFS -> NetAPP:
From: Marc Fournier <scrappy@hub.org>
In-Reply-To: <0EB27C56-93A1-4FAE-9FB5-CAD960098609@hub.org>
Date: Sun, 10 Feb 2013 17:43:16 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <61DAA500-EB20-4861-AA7F-402FF1047B81@hub.org>
References: <1946688889.2870936.1360542666536.JavaMail.root@erie.cs.uoguelph.ca>
 <0EB27C56-93A1-4FAE-9FB5-CAD960098609@hub.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1499)
Cc: freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Feb 2013 01:43:22 -0000


Just reset server, so any further details will have to be 'next time' =85 =
but, just did a csup and am rebuilding =85 the following three files =
were modified since last build:

grep nfs /tmp/output
 Edit src/sys/fs/nfs/nfs_commonsubs.c
 Edit src/sys/fs/nfsclient/nfs_clrpcops.c
 Edit src/sys/fs/nfsserver/nfs_nfsdserv.c


On 2013-02-10, at 4:56 PM, Marc Fournier <scrappy@hub.org> wrote:

>=20
> On 2013-02-10, at 4:31 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
>> Marc Fournier wrote:
>>> Hi John =85
>>>=20
>>> Does this help?
>>>=20
>>> root@io:~ # ps auxl | grep du
>>> root 1054 0.0 0.1 16176 6600 ?? D 3:15AM 0:05.38 du -skx /vm/2799 0
>>> 81426 0 20 0 newnfs
>>> root 12353 0.0 0.1 16176 5104 ?? D Sat03AM 0:05.41 du -skx /vm/2799 =
0
>>> 91597 0 20 0 newnfs
>>> root 64529 0.0 0.1 16176 5164 ?? D Fri03AM 0:05.40 du -skx /vm/2799 =
0
>>> 43227 0 20 0 newnfs
>>> root 12855 0.0 0.0 16308 1988 0 S+ 5:26AM 0:00.00 grep du 0 12847 0 =
20
>>> 0 piperd
>> It is probably too late, but all the lines (without the | grep du) =
would be
>> more useful. I also include the "H" flag, so it lists threads as well =
as
>> processes. The above just says the "du" command is waiting for a =
vnode lock.
>> The interesting process/thread is the one that is holding a vnode =
lock
>> while waiting for something else.
>=20
> As requested, 'ps auxlH' attached =85
>=20
>=20
> <ps.out.bz2>
>=20
>>=20
>> Are you still getting the:
>> nfs_getpages: error 13
>> vm_fault: pager read error, pid 11355 (https)
>=20
> Fairly quiet:
>=20
> <Screen Shot 2013-02-10 at 4.43.55 PM.png>
>=20
> And that is it since last reboot ~20 days ago =85=20
>=20
>>=20
>> messages logged?
>>=20
>> With John's recent patch, the error# would no longer be 13 if it was
>> caused by the "intr" flag resulting in a Read RPC terminating with =
EINTR.
>> If you are still getting the above with "error 13", it suggests that
>> the server is replying EACCES for the Read RPC.
>> I suggested before that you check to make sure that the executable =
had
>> read access for everyone one the file server. Since I didn't hear =
back,
>> I'll assume this is the case.
>=20
> Don't understand this question =85 I have 34 VPSs running off of this =
server right now =85 that 'du process' runs against each of those VPSs =
every night, and this problem started happening on Friday night's run =85 =
~18 days into uptime =85 so the same process has run repeatedly, with no =
issues, 18 times before it hung on Friday =85 also, the hang, once =
'triggered', only seems to recur against the same directory =85 the same =
directory doesn't necessarily trigger it, but once it starts, it appears =
to do it for the same directory =85 I'm not sure if I've ever seem it =
happening to two different directories at the same time =85
>=20
> Also, please note that the du command is run from the physical server, =
as root =85
>=20
>> rick
>> ps: If it is still up and hasn't been rebooted, you could:
>>   sysctl debug.kdb.break_to_debugger=3D1
>>   - then type <ctrl><alt><esc> at the console and do the following
>>     from the debugger
>>   =
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne=
ldebug-deadlocks.html
>>   How well this work depends on what options your kernel was built =
with.
>=20
> My remote console on that one doesn't work very well =85 I can view, =
but I can't type =85
>=20
>=20