Date: Wed, 21 Apr 1999 10:20:24 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Alfred Perlstein <bright@rush.net> Cc: Bob Bishop <rb@gid.co.uk>, Wilko Bulte <wilko@yedi.iaf.nl>, current@FreeBSD.ORG, hackers@FreeBSD.ORG Subject: Re: Alright, who's the smart alleck that fixed NFS this last week? :) , WAS: Re: solid NFS patch #6 avail for -current - need testers files) Message-ID: <199904211720.KAA06806@apollo.backplane.com> References: <Pine.BSF.3.96.990421101558.11384i-100000@cygnus.rush.net>
next in thread | previous in thread | raw e-mail | index | archive | help
:2 questions I had: : :1) you said you disabled partial writes that were causing these :mmap() problems, they were causing problems because NFS had to :muck with the structures directly in order to do zero copy? : so if our NFS impelementation didn't do that it wouldn't be :an issue probably. I know it's a good thing for speed and definetly :essensial, but i'm not sure i understand NFS and the FS getting out :of sync. The problem w/ the partial writes has to do with cache coherency between the server, the client's VFS subsystem ( read() and write() ), and the client's VM subsystem ( mmap() ). NFS implemented the notion of unaligned valid and dirty range using struct buf's b_validoff, b_validend, b_dirtyoff, and b_dirtyend fields in order to keep track of partial writes without having to read-in the rest of the buffer. The implementation was very fragile and failed to address a number of combination situations that would occur with mmap(), read(), and write(). This in turn lead to a series of problems and, further, lead to the situation where we would fix unrelated bugs in the VM system and cause NFS to break. I finally gave up on it. What NFS does now is optimize only two write situations: (1) when a write covers the entire buffer, e.g. an 8K+ write on an 8K boundry. And (2) piecemeal writes in the write-append case. Both cases allow us to mark the buffer as essentially being fully valid without having to mess with valid and dirty ranges. We use buf->b_bcount to handle the file EOF case and resize it rather then try to use b_validoff/b_validend. Thus, b_validoff/b_validend have been completely removed. b_dirtyoff/b_dirtyend have been left intact in order to allow us to support piecemeal write RPC's. This is different from the piecemeal write optimizations we were doing before. In this case, we are able to support piecemeal writes in the middle of the file that go into *PRELOADED* buffers. That is, A read-merge-write case. The original code attempted to do piecemeal writes without the read-before resulting in the partially invalid, partially dirty buffer. Now we only allow piecemeal writes to occur in fully-valid buffers. While we could theoretically discard the dirty range and simply writeback the entire buffer when a modification is made to part of it, we keep the dirty range in order to *only* write the portion of the buffer that the explicit write() covered. This is done for cache coherency reasons. For example, take the situation where two different client machines do a seek/write to different portions of the same server-backed NFS file, where the two areas abut each other. Say one client writes 2 bytes at seek offset 10 and the second client writes 2 bytes at seek offset 12. As long as the areas are not overlapping, we want this type of operation to work properly and not scramble the data on the server even if the client's idea of the state of the date is not coherent. :2) at BAFUG 2 or 3 months ago I, *cough* attempted to keep up with you :an Julian talking about VM issues. :) Something you guys brought up :was problems with mmap() + read()/write() no staying in sync and requireing :an msync() to correctly syncronize. I really didn't understand how this :could happen except recently I figured that my first question could be :the answer. Does this problem only happen on NFS mounted dirs? Is it :fixed? : :thanks again, :-Alfred This should not be an issue any more for either UFS or NFS. If people find that it is an issue, there's a bug somewhere that needs to be addressed. This *was* an issue for NFS prior to the patch set. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904211720.KAA06806>