Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Aug 1999 17:22:17 +0100
From:      David Malone <dwmalone@maths.tcd.ie>
To:        freebsd-hackers@freebsd.org
Subject:   vm_fault: pager read error on NFS filesystems.
Message-ID:   <199908241722.aa13014@salmon.maths.tcd.ie>

next in thread | raw e-mail | index | archive | help
I tried mailing this to freebsd-stable but got no response.

There is a problem when you remove a running executable on an NFS
filesystem. Basically you end up with lots of "vm_fault: pager read
error" messages - and I mean lots - I've seen 184888 messages in
a little under and hour, and the performance of the machine suffers
severely.

I've figured out when that this happens when the running process
catches SIGBUS. The process tries to fault a page in, finds the
executable is gone, gets a SIGBUS, tries to fault the SIGBUS handler
in, gets a SIGBUS, tries to fault the SIGBUS handler in....

This wouldn't really be such a serious problem if vm_fault didn't
log a message for every failed fault. I can see two possible fixes
for this problem:

	1) Stop vm_fault logging so much stuff.
	2) Change sendsig to check if catching SIGBUS of SIGSEGV
	will cause a SIGBUS or SIGSEGV. If it will send the process
	a SIGKILL.

If someone can say which fix seems better then I can probably
produce the diffs.

This seems to be a particular problem for us 'cos we have people
developing MPI programs. Since they are parallel programs they
end up being run over NFS, and they catch most signals so if there
is a problem they can shut down the entire cluster of programs.

I don't thing there is much you can do about programs looping in
signal receiving loop in general (you could have the SIGBUS handler
execute an illegal instruction and have the SIGILL handler point
to an address which causes a SIGBUS), but because of the verboseness
of vm_fault something needs to be done about the SIGBUS causes SIGBUS
case.

There also seem to be other problems associated with changing the
executable over NFS, including panics and init stopping reaping
zombies, which I haven't had a chance to look at yet.

	David.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi? <199908241722.aa13014>