From owner-freebsd-arch@FreeBSD.ORG Fri Jun 19 16:23:56 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA946106564A for ; Fri, 19 Jun 2009 16:23:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 6AEA78FC08 for ; Fri, 19 Jun 2009 16:23:56 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mx1.stack.nl (Postfix, from userid 65534) id CDE0A359931; Fri, 19 Jun 2009 18:23:55 +0200 (CEST) X-Spam-DCC: EATSERVER: scanner01.stack.nl 1166; Body=1 Fuz1=1 Fuz2=1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on scanner01.stack.nl X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Relay-Country: _RELAYCOUNTRY_ Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id C09E535992A for ; Fri, 19 Jun 2009 18:23:53 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 7D6D3228CB; Fri, 19 Jun 2009 18:23:28 +0200 (CEST) Date: Fri, 19 Jun 2009 18:23:28 +0200 From: Jilles Tjoelker To: freebsd-arch@freebsd.org Message-ID: <20090619162328.GA79975@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Subject: deadlocks with intr NFS mounts and ^Z (or: PCATCH and sleepable locks) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Jun 2009 16:23:56 -0000 I have been having trouble with deadlocks with NFS mounts for a while, and I have found at least one way it can deadlock. It seems an issue with the sleep/lock system. NFS sleeps while holding a lockmgr lock, waiting for a reply from the server. When the mount is set intr, this is an interruptible sleep, so that interrupting signals can abort the sleep. However, this also means that SIGSTOP etc will suspend the thread without waking it up first, so it will be suspended with a lock held. If it holds the wrong locks, it is possible that the shell will not be able to run, and the process cannot be continued in the normal manner. Due to some other things I do not understand, it is then possible that the process cannot be continued at all (SIGCONT seems ignored), but in simple cases SIGCONT works, and things go back to normal. In any case, this situation is undesirable, as even 'umount -f' doesn't work while the thread is suspended. Of course, this reasoning applies to any code that goes to sleep interruptibly while holding a lock (sx or lockmgr). Is this supposed to be possible (likely useful)? If so, a third type of sleep would be needed that is interrupted by signals but not suspended? If not, something should check that it doesn't happen and NFS intr mounts may need to check for signals periodically or find a way to avoid sleeping with locks held. The td_locks field is only accessible for the current thread, so it cannot be used to check if suspending is safe. Also, making SIGSTOP and the like interrupt/restart syscalls is not acceptable unless you find some way to do it such that userland won't notice. For example, a read of 10 megabytes from a regular file with that much available must not return less then 10 megabytes. -- Jilles Tjoelker