Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Aug 2011 16:10:54 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Andrew Duane <aduane@juniper.net>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Dumping core over NFS
Message-ID:  <1748900458.38612.1313093454775.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <AC6674AB7BC78549BB231821ABF7A9AEB72924DB25@EMBX01-WF.jnpr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Duane wrote:
> We have a strange problem in 6.2 that we're wondering if anyone else
> has seen. If a process is dumping core to an NFS-mounted directory,
> sending SIGINT, SIGTERM, or SIGKILL to that process causes NFS to
> wedge. The nfs_asyncio starts complaining that 20 iods are already
> processing the mount, but nothing makes any forward progress.
> 
> Sending SIGUSR1, SIGUSR2, or SIGABRT seem to work fine, as does any
> signal if the core dump is going to a local filesystem.
> 
> Before I dig into this apparent deadlock, just wondering if it's been
> seen before.
> 
The only thing I can tell you is that SIGINT, SIGTERM are signals that are
handled differently by mounts with the "intr" option set. For this case,
the client tries to make the syscall in progress fail with EINTR when one
of these signals is posted. I have no idea what effect this might have on
a core dump in progress or if you are using "intr" mounts.

There was an issue in FreeBSD8.[01] (for the "intr" case) where the termination signal could get
the krpc code in a loop when trying to re-establish a TCP connection because
an msleep() would always return EINTR right away without waiting for the
connection attempt to complete and then code outside that would just try
it again and again and... This bug was fixed for FreeBSD8.2.
Obviously it's not the same bug since FreeBSD6 didn't have a krpc subsystem,
but you might look for something similar. (ie. a sleep(...PCATCH...) and then
a caller that just tries again for it returning EINTR.

If you use "intr", you might also try without "intr" and see if that has
any effect.

Good luck with it, rick
> ...................................
> 
> Andrew Duane
> Juniper Networks
> o +1 978 589 0551
> m +1 603-770-7088
> aduane@juniper.net
> 
> 
> 
> 
> 
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1748900458.38612.1313093454775.JavaMail.root>