From owner-freebsd-hackers Sat Jun 12 21:39:42 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 52D9314D7B for ; Sat, 12 Jun 1999 21:39:39 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id VAA65304; Sat, 12 Jun 1999 21:39:25 -0700 (PDT) (envelope-from dillon) Date: Sat, 12 Jun 1999 21:39:25 -0700 (PDT) From: Matthew Dillon Message-Id: <199906130439.VAA65304@apollo.backplane.com> To: hgoldste@bbs.mpcs.com (Howard Goldstein), dyson@iquest.net, freebsd-hackers@FreeBSD.ORG, "John S. Dyson" Subject: Re: problem for the VM gurus References: <199906091233.HAA00173@dyson.iquest.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Interesting. It's an overlapping same-process deadlock with mmap/write. This bug also hits NFS, though in a slightly different way, and also occurs with mmap/write when two processes are mmap'ing two files and write()ing the other descriptor using the map as a buffer. I see a three-stage solution: * We change the API for the VM pager *getpages() code. At the moment the caller busies all pages being passed to getpages() and expects the primary page (but not any of the others) to be returned busied. I also believe that some of the code assumes that the page will not be unbusied at all for the duration of the operation ( though vm_fault was hacked to handle the situation where it might have been ). This API is screwing up NFS and would also make it very difficult for general VFS deadlock avoidance to be implemented properly and for a fix to the specific case being discussed in this thread to be implemented properly. I recommend changing the API such that *ALL* passed pages are unbusied prior to return. The caller of getpages() must then VM lookup the page again. Always. vm_fault already does this, in fact. We would clean up the code and document it to this effect. This change would allow us to immediately fix the self-referential deadlocks and I think it would also allow me to fix a similar bug in NFS trivially. * We hack a fix to deal with the mmap/write case. A permanent vnode locking fix is many months away because core decided to ask Kirk to fix it, which was news to me at the time. However, I agree with the idea of having Kirk fix VNode locking. But since this sort of permanent fix is months away, we really need an interim solution to the mmap/write deadlock case. The easiest interim solution is to break write atomicy. That is, unlock the vnode if the backing store of the uio being written is (A) vnode-pager-backed and (B) not all in-core. This will generally fix all known deadlock situations but at the cost of write atomicy in certain cases. We can use the same hack that pipe code uses and only guarentee write atomicy for small block sizes. We would do this by wiring ( and faulting, if necessary ) the first N pages of the uio prior to locking the vnode. We cannot wire all the pages of the uio since the user may specify a very large buffer - megabytes or gigabytes. * Stage 3: Permanent fix is committed by generally fixing vnode locks and VFS layering. ... which may be 6 months if Kirk agrees to do a complete rewrite of the vnode locking algorithms. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message