From owner-freebsd-hackers  Sat Jun 12 21:39:42 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP id 52D9314D7B
	for <freebsd-hackers@FreeBSD.ORG>; Sat, 12 Jun 1999 21:39:39 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id VAA65304;
	Sat, 12 Jun 1999 21:39:25 -0700 (PDT)
	(envelope-from dillon)
Date: Sat, 12 Jun 1999 21:39:25 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199906130439.VAA65304@apollo.backplane.com>
To: hgoldste@bbs.mpcs.com (Howard Goldstein), dyson@iquest.net,
	freebsd-hackers@FreeBSD.ORG, "John S. Dyson" <toor@dyson.iquest.net>
Subject: Re: problem for the VM gurus
References:  <199906091233.HAA00173@dyson.iquest.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

    Interesting.  It's an overlapping same-process deadlock with mmap/write.
    This bug also hits NFS, though in a slightly different way, and also
    occurs with mmap/write when two processes are mmap'ing two files and
    write()ing the other descriptor using the map as a buffer.

    I see a three-stage solution:

    * We change the API for the VM pager *getpages() code.

	At the moment the caller busies all pages being passed to getpages()
	and expects the primary page (but not any of the others) to be 
	returned busied.  I also believe that some of the code assumes that
	the page will not be unbusied at all for the duration of the
	operation ( though vm_fault was hacked to handle the situation where
	it might have been ). 

	This API is screwing up NFS and would also make it very difficult for
	general VFS deadlock avoidance to be implemented properly and for
	a fix to the specific case being discussed in this thread to be 
	implemented properly.

	I recommend changing the API such that *ALL* passed pages are 
	unbusied prior to return.  The caller of getpages() must then 
	VM lookup the page again.  Always.  vm_fault already does this, 
	in fact.   We would clean up the code and document it to this effect.

	This change would allow us to immediately fix the self-referential
	deadlocks and I think it would also allow me to fix a similar bug
	in NFS trivially.

    * We hack a fix to deal with the mmap/write case.

	A permanent vnode locking fix is many months away because core
	decided to ask Kirk to fix it, which was news to me at the time.
	However, I agree with the idea of having Kirk fix VNode locking.

	But since this sort of permanent fix is months away, we really need
	an interim solution to the mmap/write deadlock case.

	The easiest interim solution is to break write atomicy.  That is,
	unlock the vnode if the backing store of the uio being written is
	(A) vnode-pager-backed and (B) not all in-core. 

	This will generally fix all known deadlock situations but at the
	cost of write atomicy in certain cases.  We can use the same hack
	that pipe code uses and only guarentee write atomicy for small 
	block sizes.  We would do this by wiring ( and faulting, if 
	necessary ) the first N pages of the uio prior to locking the vnode.

	We cannot wire all the pages of the uio since the user may specify
	a very large buffer - megabytes or gigabytes.

    * Stage 3:  Permanent fix is committed by generally fixing vnode locks
      and VFS layering.

	... which may be 6 months if Kirk agrees to do a complete rewrite
	of the vnode locking algorithms.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message