Date: Wed, 21 Apr 1999 10:20:24 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Alfred Perlstein <bright@rush.net> Cc: Bob Bishop <rb@gid.co.uk>, Wilko Bulte <wilko@yedi.iaf.nl>, current@FreeBSD.ORG, hackers@FreeBSD.ORG Subject: Re: Alright, who's the smart alleck that fixed NFS this last week? :) , WAS: Re: solid NFS patch #6 avail for -current - need testers files) Message-ID: <199904211720.KAA06806@apollo.backplane.com> References: <Pine.BSF.3.96.990421101558.11384i-100000@cygnus.rush.net>
next in thread | previous in thread | raw e-mail | index | archive | help
:2 questions I had:
:
:1) you said you disabled partial writes that were causing these
:mmap() problems, they were causing problems because NFS had to
:muck with the structures directly in order to do zero copy?
: so if our NFS impelementation didn't do that it wouldn't be
:an issue probably. I know it's a good thing for speed and definetly
:essensial, but i'm not sure i understand NFS and the FS getting out
:of sync.
The problem w/ the partial writes has to do with cache coherency
between the server, the client's VFS subsystem ( read() and write() ),
and the client's VM subsystem ( mmap() ). NFS implemented the notion
of unaligned valid and dirty range using struct buf's b_validoff,
b_validend, b_dirtyoff, and b_dirtyend fields in order to keep track
of partial writes without having to read-in the rest of the buffer. The
implementation was very fragile and failed to address a number of
combination situations that would occur with mmap(), read(), and write().
This in turn lead to a series of problems and, further, lead to the
situation where we would fix unrelated bugs in the VM system and cause
NFS to break.
I finally gave up on it. What NFS does now is optimize only two write
situations: (1) when a write covers the entire buffer, e.g. an 8K+
write on an 8K boundry. And (2) piecemeal writes in the write-append
case. Both cases allow us to mark the buffer as essentially being fully
valid without having to mess with valid and dirty ranges. We use
buf->b_bcount to handle the file EOF case and resize it rather then try
to use b_validoff/b_validend. Thus, b_validoff/b_validend have been
completely removed.
b_dirtyoff/b_dirtyend have been left intact in order to allow us to
support piecemeal write RPC's. This is different from the piecemeal
write optimizations we were doing before. In this case, we are able
to support piecemeal writes in the middle of the file that go
into *PRELOADED* buffers. That is, A read-merge-write case. The original
code attempted to do piecemeal writes without the read-before resulting
in the partially invalid, partially dirty buffer. Now we only allow
piecemeal writes to occur in fully-valid buffers. While we could
theoretically discard the dirty range and simply writeback the entire
buffer when a modification is made to part of it, we keep the dirty range
in order to *only* write the portion of the buffer that the explicit
write() covered. This is done for cache coherency reasons.
For example, take the situation where two different client machines
do a seek/write to different portions of the same server-backed NFS file,
where the two areas abut each other. Say one client writes 2 bytes at
seek offset 10 and the second client writes 2 bytes at seek offset 12.
As long as the areas are not overlapping, we want this type of operation
to work properly and not scramble the data on the server even if the
client's idea of the state of the date is not coherent.
:2) at BAFUG 2 or 3 months ago I, *cough* attempted to keep up with you
:an Julian talking about VM issues. :) Something you guys brought up
:was problems with mmap() + read()/write() no staying in sync and requireing
:an msync() to correctly syncronize. I really didn't understand how this
:could happen except recently I figured that my first question could be
:the answer. Does this problem only happen on NFS mounted dirs? Is it
:fixed?
:
:thanks again,
:-Alfred
This should not be an issue any more for either UFS or NFS. If people
find that it is an issue, there's a bug somewhere that needs to be
addressed. This *was* an issue for NFS prior to the patch set.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904211720.KAA06806>
