From owner-freebsd-stable@FreeBSD.ORG Mon Feb 11 00:56:07 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EA4B811E; Mon, 11 Feb 2013 00:56:07 +0000 (UTC) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.208.146]) by mx1.freebsd.org (Postfix) with ESMTP id 498CFB71; Mon, 11 Feb 2013 00:56:07 +0000 (UTC) Received: from maia.hub.org (unknown [200.46.151.189]) by hub.org (Postfix) with ESMTP id 3494A458D4C; Sun, 10 Feb 2013 20:56:05 -0400 (AST) Received: from hub.org ([200.46.208.146]) by maia.hub.org (mx1.hub.org [200.46.151.189]) (amavisd-maia, port 10024) with ESMTP id 91475-01; Mon, 11 Feb 2013 00:56:04 +0000 (UTC) Received: from [192.168.0.52] (S01067cb21b2ff4ca.gv.shawcable.net [24.108.26.71]) by hub.org (Postfix) with ESMTPA id 7F98F458D4B; Sun, 10 Feb 2013 20:56:02 -0400 (AST) Content-Type: multipart/mixed; boundary="Apple-Mail=_4E15226B-B282-4E39-9A08-2093243749EB" Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: 9-STABLE -> NFS -> NetAPP: From: Marc Fournier In-Reply-To: <1946688889.2870936.1360542666536.JavaMail.root@erie.cs.uoguelph.ca> Date: Sun, 10 Feb 2013 16:56:00 -0800 Message-Id: <0EB27C56-93A1-4FAE-9FB5-CAD960098609@hub.org> References: <1946688889.2870936.1360542666536.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1499) X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-stable@freebsd.org, John Baldwin X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Feb 2013 00:56:08 -0000 --Apple-Mail=_4E15226B-B282-4E39-9A08-2093243749EB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On 2013-02-10, at 4:31 PM, Rick Macklem wrote: > Marc Fournier wrote: >> Hi John =85 >>=20 >> Does this help? >>=20 >> root@io:~ # ps auxl | grep du >> root 1054 0.0 0.1 16176 6600 ?? D 3:15AM 0:05.38 du -skx /vm/2799 0 >> 81426 0 20 0 newnfs >> root 12353 0.0 0.1 16176 5104 ?? D Sat03AM 0:05.41 du -skx /vm/2799 0 >> 91597 0 20 0 newnfs >> root 64529 0.0 0.1 16176 5164 ?? D Fri03AM 0:05.40 du -skx /vm/2799 0 >> 43227 0 20 0 newnfs >> root 12855 0.0 0.0 16308 1988 0 S+ 5:26AM 0:00.00 grep du 0 12847 0 = 20 >> 0 piperd > It is probably too late, but all the lines (without the | grep du) = would be > more useful. I also include the "H" flag, so it lists threads as well = as > processes. The above just says the "du" command is waiting for a vnode = lock. > The interesting process/thread is the one that is holding a vnode lock > while waiting for something else. As requested, 'ps auxlH' attached =85 --Apple-Mail=_4E15226B-B282-4E39-9A08-2093243749EB Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii > > Are you still getting the: > nfs_getpages: error 13 > vm_fault: pager read error, pid 11355 (https) Fairly quiet: --Apple-Mail=_4E15226B-B282-4E39-9A08-2093243749EB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 And that is it since last reboot ~20 days ago =85=20 >=20 > messages logged? >=20 > With John's recent patch, the error# would no longer be 13 if it was > caused by the "intr" flag resulting in a Read RPC terminating with = EINTR. > If you are still getting the above with "error 13", it suggests that > the server is replying EACCES for the Read RPC. > I suggested before that you check to make sure that the executable had > read access for everyone one the file server. Since I didn't hear = back, > I'll assume this is the case. Don't understand this question =85 I have 34 VPSs running off of this = server right now =85 that 'du process' runs against each of those VPSs = every night, and this problem started happening on Friday night's run =85 = ~18 days into uptime =85 so the same process has run repeatedly, with no = issues, 18 times before it hung on Friday =85 also, the hang, once = 'triggered', only seems to recur against the same directory =85 the same = directory doesn't necessarily trigger it, but once it starts, it appears = to do it for the same directory =85 I'm not sure if I've ever seem it = happening to two different directories at the same time =85 Also, please note that the du command is run from the physical server, = as root =85 > rick > ps: If it is still up and hasn't been rebooted, you could: > sysctl debug.kdb.break_to_debugger=3D1 > - then type at the console and do the following > from the debugger > = http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne= ldebug-deadlocks.html > How well this work depends on what options your kernel was built = with. My remote console on that one doesn't work very well =85 I can view, but = I can't type =85 --Apple-Mail=_4E15226B-B282-4E39-9A08-2093243749EB--