From owner-freebsd-hackers  Tue Aug 24 10: 5:30 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id 5C6D115BAB
	for <freebsd-hackers@FreeBSD.ORG>; Tue, 24 Aug 1999 09:58:17 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id JAA17289;
	Tue, 24 Aug 1999 09:58:08 -0700 (PDT)
	(envelope-from dillon)
Date: Tue, 24 Aug 1999 09:58:08 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199908241658.JAA17289@apollo.backplane.com>
To: David Malone <dwmalone@maths.tcd.ie>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: vm_fault: pager read error on NFS filesystems.
References:  <199908241722.aa13014@salmon.maths.tcd.ie>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:I tried mailing this to freebsd-stable but got no response.
:
:There is a problem when you remove a running executable on an NFS
:filesystem. Basically you end up with lots of "vm_fault: pager read
:error" messages - and I mean lots - I've seen 184888 messages in
:a little under and hour, and the performance of the machine suffers
:severely.
:
:I've figured out when that this happens when the running process
:catches SIGBUS. The process tries to fault a page in, finds the
:executable is gone, gets a SIGBUS, tries to fault the SIGBUS handler
:in, gets a SIGBUS, tries to fault the SIGBUS handler in....
:
:This wouldn't really be such a serious problem if vm_fault didn't
:log a message for every failed fault. I can see two possible fixes
:for this problem:
:
:	1) Stop vm_fault logging so much stuff.
:	2) Change sendsig to check if catching SIGBUS of SIGSEGV
:	will cause a SIGBUS or SIGSEGV. If it will send the process
:	a SIGKILL.
:
:If someone can say which fix seems better then I can probably
:produce the diffs.
:
:This seems to be a particular problem for us 'cos we have people
:...

    Well, we can't do #2 - that would make us incompatible with
    the API.

    We can do #1, but not for a week or so as there are some
    other patches to vm_fault still in the queue.  

    It would be fairly easy to add a sysctl to control VM
    related logging.  The sysctl would default to 1.

:There also seem to be other problems associated with changing the
:executable over NFS, including panics and init stopping reaping
:zombies, which I haven't had a chance to look at yet.
:
:	David.

    panics on the client or server?  If you run a kernel with DDB
    enabled and get a panic, you can use the 'trace' command from
    the console to print out the stack backtrace.  This only works
    if you aren't running an X display on that machine, though.

    init should definitely not stop reaping zombies.  If the
    init binary itself runs over NFS and you have updated it,
    you have no choice but to reboot.  If the init binary was
    not updated and the init process is messing up, a 'ps axl'
    should tell you where the init process is stuck.

    If the machine is still copesetic, you can gdb -k the
    running kernel and get a stack backtrace of the init process
    to see where it is stuck.  

	gdb -k /kernel /dev/mem
	proc 1
	back

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message