Date: Sun, 10 Feb 2013 17:43:16 -0800 From: Marc Fournier <scrappy@hub.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: 9-STABLE -> NFS -> NetAPP: Message-ID: <61DAA500-EB20-4861-AA7F-402FF1047B81@hub.org> In-Reply-To: <0EB27C56-93A1-4FAE-9FB5-CAD960098609@hub.org> References: <1946688889.2870936.1360542666536.JavaMail.root@erie.cs.uoguelph.ca> <0EB27C56-93A1-4FAE-9FB5-CAD960098609@hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Just reset server, so any further details will have to be 'next time' =85 = but, just did a csup and am rebuilding =85 the following three files = were modified since last build: grep nfs /tmp/output Edit src/sys/fs/nfs/nfs_commonsubs.c Edit src/sys/fs/nfsclient/nfs_clrpcops.c Edit src/sys/fs/nfsserver/nfs_nfsdserv.c On 2013-02-10, at 4:56 PM, Marc Fournier <scrappy@hub.org> wrote: >=20 > On 2013-02-10, at 4:31 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: >=20 >> Marc Fournier wrote: >>> Hi John =85 >>>=20 >>> Does this help? >>>=20 >>> root@io:~ # ps auxl | grep du >>> root 1054 0.0 0.1 16176 6600 ?? D 3:15AM 0:05.38 du -skx /vm/2799 0 >>> 81426 0 20 0 newnfs >>> root 12353 0.0 0.1 16176 5104 ?? D Sat03AM 0:05.41 du -skx /vm/2799 = 0 >>> 91597 0 20 0 newnfs >>> root 64529 0.0 0.1 16176 5164 ?? D Fri03AM 0:05.40 du -skx /vm/2799 = 0 >>> 43227 0 20 0 newnfs >>> root 12855 0.0 0.0 16308 1988 0 S+ 5:26AM 0:00.00 grep du 0 12847 0 = 20 >>> 0 piperd >> It is probably too late, but all the lines (without the | grep du) = would be >> more useful. I also include the "H" flag, so it lists threads as well = as >> processes. The above just says the "du" command is waiting for a = vnode lock. >> The interesting process/thread is the one that is holding a vnode = lock >> while waiting for something else. >=20 > As requested, 'ps auxlH' attached =85 >=20 >=20 > <ps.out.bz2> >=20 >>=20 >> Are you still getting the: >> nfs_getpages: error 13 >> vm_fault: pager read error, pid 11355 (https) >=20 > Fairly quiet: >=20 > <Screen Shot 2013-02-10 at 4.43.55 PM.png> >=20 > And that is it since last reboot ~20 days ago =85=20 >=20 >>=20 >> messages logged? >>=20 >> With John's recent patch, the error# would no longer be 13 if it was >> caused by the "intr" flag resulting in a Read RPC terminating with = EINTR. >> If you are still getting the above with "error 13", it suggests that >> the server is replying EACCES for the Read RPC. >> I suggested before that you check to make sure that the executable = had >> read access for everyone one the file server. Since I didn't hear = back, >> I'll assume this is the case. >=20 > Don't understand this question =85 I have 34 VPSs running off of this = server right now =85 that 'du process' runs against each of those VPSs = every night, and this problem started happening on Friday night's run =85 = ~18 days into uptime =85 so the same process has run repeatedly, with no = issues, 18 times before it hung on Friday =85 also, the hang, once = 'triggered', only seems to recur against the same directory =85 the same = directory doesn't necessarily trigger it, but once it starts, it appears = to do it for the same directory =85 I'm not sure if I've ever seem it = happening to two different directories at the same time =85 >=20 > Also, please note that the du command is run from the physical server, = as root =85 >=20 >> rick >> ps: If it is still up and hasn't been rebooted, you could: >> sysctl debug.kdb.break_to_debugger=3D1 >> - then type <ctrl><alt><esc> at the console and do the following >> from the debugger >> = http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne= ldebug-deadlocks.html >> How well this work depends on what options your kernel was built = with. >=20 > My remote console on that one doesn't work very well =85 I can view, = but I can't type =85 >=20 >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?61DAA500-EB20-4861-AA7F-402FF1047B81>