From owner-freebsd-hackers Tue Aug 24 10: 5:30 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 5C6D115BAB for ; Tue, 24 Aug 1999 09:58:17 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id JAA17289; Tue, 24 Aug 1999 09:58:08 -0700 (PDT) (envelope-from dillon) Date: Tue, 24 Aug 1999 09:58:08 -0700 (PDT) From: Matthew Dillon Message-Id: <199908241658.JAA17289@apollo.backplane.com> To: David Malone Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: vm_fault: pager read error on NFS filesystems. References: <199908241722.aa13014@salmon.maths.tcd.ie> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I tried mailing this to freebsd-stable but got no response. : :There is a problem when you remove a running executable on an NFS :filesystem. Basically you end up with lots of "vm_fault: pager read :error" messages - and I mean lots - I've seen 184888 messages in :a little under and hour, and the performance of the machine suffers :severely. : :I've figured out when that this happens when the running process :catches SIGBUS. The process tries to fault a page in, finds the :executable is gone, gets a SIGBUS, tries to fault the SIGBUS handler :in, gets a SIGBUS, tries to fault the SIGBUS handler in.... : :This wouldn't really be such a serious problem if vm_fault didn't :log a message for every failed fault. I can see two possible fixes :for this problem: : : 1) Stop vm_fault logging so much stuff. : 2) Change sendsig to check if catching SIGBUS of SIGSEGV : will cause a SIGBUS or SIGSEGV. If it will send the process : a SIGKILL. : :If someone can say which fix seems better then I can probably :produce the diffs. : :This seems to be a particular problem for us 'cos we have people :... Well, we can't do #2 - that would make us incompatible with the API. We can do #1, but not for a week or so as there are some other patches to vm_fault still in the queue. It would be fairly easy to add a sysctl to control VM related logging. The sysctl would default to 1. :There also seem to be other problems associated with changing the :executable over NFS, including panics and init stopping reaping :zombies, which I haven't had a chance to look at yet. : : David. panics on the client or server? If you run a kernel with DDB enabled and get a panic, you can use the 'trace' command from the console to print out the stack backtrace. This only works if you aren't running an X display on that machine, though. init should definitely not stop reaping zombies. If the init binary itself runs over NFS and you have updated it, you have no choice but to reboot. If the init binary was not updated and the init process is messing up, a 'ps axl' should tell you where the init process is stuck. If the machine is still copesetic, you can gdb -k the running kernel and get a stack backtrace of the init process to see where it is stuck. gdb -k /kernel /dev/mem proc 1 back -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message