Date: Sun, 13 Jun 1999 23:59:34 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: David Greenman <dg@root.com> Cc: dyson@iquest.net, freebsd-hackers@FreeBSD.ORG Subject: Re: problem for the VM gurus Message-ID: <199906140659.XAA06727@apollo.backplane.com> References: <199906132321.QAA26382@implode.root.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:> VM lookup the page again. Always. vm_fault already does this,
:> in fact. We would clean up the code and document it to this effect.
:>
:> This change would allow us to immediately fix the self-referential
:> deadlocks and I think it would also allow me to fix a similar bug
:> in NFS trivially.
:
: I should point out here that the process of looking up the pages is a
:significant amount of the overhead of the routines involved. Although
:doing this for just one page is probably sufficiently in the noise as to
:not be a concern.
It would be for only one page and, besides, it *already* relooksup
the page in vm_fault ( to see if the page was ripped out from under the
caller ), so the overhead on the change would be very near zero.
:> The easiest interim solution is to break write atomicy. That is,
:> unlock the vnode if the backing store of the uio being written is
:> (A) vnode-pager-backed and (B) not all in-core.
:
: Uh, I don't think you can safely do that. I thought one of the reasons
:for locking a vnode for writes is so that the file metadata doesn't change
:underneath you while the write is in progress, but perhaps I'm wrong about
:that.
:
:-DG
:
:David Greenman
The problem can be distilled into the fact that we currently hold an
exclusive lock *through* a uiomove that might possibly incur read I/O
due to pages not being entirely in core. The problem does *not* occur
when we are blocked on meta-data I/O ( such as a BMAP operation ) since
meta-data cannot be mmaped. Under current circumstances we already
lose read atomicy on the source during the write(), but do not lose
write() atomicy.
The simple solution is to give up or downgrade the lock on the
destination when blocked within the uiomove. We can pre-fault
the first two pages of the uio to guarentee a minimum write atomicy
I/O size. I suppose this could be extended to pre-faulting the
first N pages of the uio, where N is chosen to be reasonably large - like
64K, but we could not guarentee arbitrary write atomicy because the user
might decide to write a very large mmap'd buffer ( e.g. megabytes or
gigabytes ) and obviously wiring that many pages just won't work.
The more complex solution is to implement a separate range lock for
I/O that is independant of the vnode lock. This solution would also
require deadlock detection and restart handling. Atomicy would be
maintained from the point of view of the processes running on the machine
but not from the point of view of the physical storage. Since write
atomicy is already not maintained from the point of view of the physical
storage I don't think this would present a problem. Due to the
complexity, however, it could not be used as an interim solution. It
would have to be a permanent solution for the programming time to be
worth it. Doing range-based deadlock detection and restart handling
properly is not trivial. It is something that only databases usually
need to do.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906140659.XAA06727>
