From owner-freebsd-hackers Tue Aug 24 9:24:27 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by hub.freebsd.org (Postfix) with SMTP id C56B915AD7 for ; Tue, 24 Aug 1999 09:22:21 -0700 (PDT) (envelope-from dwmalone@maths.tcd.ie) Received: from gosset.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 24 Aug 1999 17:22:19 +0100 (BST) To: freebsd-hackers@freebsd.org Subject: vm_fault: pager read error on NFS filesystems. X-Request-Do: Date: Tue, 24 Aug 1999 17:22:17 +0100 From: David Malone Message-ID: <199908241722.aa13014@salmon.maths.tcd.ie> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I tried mailing this to freebsd-stable but got no response. There is a problem when you remove a running executable on an NFS filesystem. Basically you end up with lots of "vm_fault: pager read error" messages - and I mean lots - I've seen 184888 messages in a little under and hour, and the performance of the machine suffers severely. I've figured out when that this happens when the running process catches SIGBUS. The process tries to fault a page in, finds the executable is gone, gets a SIGBUS, tries to fault the SIGBUS handler in, gets a SIGBUS, tries to fault the SIGBUS handler in.... This wouldn't really be such a serious problem if vm_fault didn't log a message for every failed fault. I can see two possible fixes for this problem: 1) Stop vm_fault logging so much stuff. 2) Change sendsig to check if catching SIGBUS of SIGSEGV will cause a SIGBUS or SIGSEGV. If it will send the process a SIGKILL. If someone can say which fix seems better then I can probably produce the diffs. This seems to be a particular problem for us 'cos we have people developing MPI programs. Since they are parallel programs they end up being run over NFS, and they catch most signals so if there is a problem they can shut down the entire cluster of programs. I don't thing there is much you can do about programs looping in signal receiving loop in general (you could have the SIGBUS handler execute an illegal instruction and have the SIGILL handler point to an address which causes a SIGBUS), but because of the verboseness of vm_fault something needs to be done about the SIGBUS causes SIGBUS case. There also seem to be other problems associated with changing the executable over NFS, including panics and init stopping reaping zombies, which I haven't had a chance to look at yet. David. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message