From owner-freebsd-hackers  Wed Apr 21 10:23: 5 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 480D514E23; Wed, 21 Apr 1999 10:22:54 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id KAA06806;
	Wed, 21 Apr 1999 10:20:24 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 21 Apr 1999 10:20:24 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199904211720.KAA06806@apollo.backplane.com>
To: Alfred Perlstein <bright@rush.net>
Cc: Bob Bishop <rb@gid.co.uk>, Wilko Bulte <wilko@yedi.iaf.nl>,
	current@FreeBSD.ORG, hackers@FreeBSD.ORG
Subject: Re: Alright, who's the smart alleck that fixed NFS this last week? :) , WAS: Re: solid NFS patch #6 avail for -current - need testers  files)
References:  <Pine.BSF.3.96.990421101558.11384i-100000@cygnus.rush.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:2 questions I had:
:
:1) you said you disabled partial writes that were causing these 
:mmap() problems, they were causing problems because NFS had to
:muck with the structures directly in order to do zero copy?
:   so if our NFS impelementation didn't do that it wouldn't be
:an issue probably.  I know it's a good thing for speed and definetly
:essensial, but i'm not sure i understand NFS and the FS getting out
:of sync.

    The problem w/ the partial writes has to do with cache coherency
    between the server, the client's VFS subsystem ( read() and write() ),
    and the client's VM subsystem ( mmap() ).  NFS implemented the notion 
    of unaligned valid and dirty range using struct buf's b_validoff, 
    b_validend, b_dirtyoff, and b_dirtyend fields in order to keep track 
    of partial writes without having to read-in the rest of the buffer.  The 
    implementation was very fragile and failed to address a number of
    combination situations that would occur with mmap(), read(), and write().
    This in turn lead to a series of problems and, further, lead to the 
    situation where we would fix unrelated bugs in the VM system and cause
    NFS to break.

    I finally gave up on it.  What NFS does now is optimize only two write
    situations:  (1) when a write covers the entire buffer, e.g. an 8K+
    write on an 8K boundry.  And (2) piecemeal writes in the write-append
    case.  Both cases allow us to mark the buffer as essentially being fully
    valid without having to mess with valid and dirty ranges.  We use 
    buf->b_bcount to handle the file EOF case and resize it rather then try
    to use b_validoff/b_validend.  Thus, b_validoff/b_validend have been
    completely removed.  

    b_dirtyoff/b_dirtyend have been left intact in order to allow us to 
    support piecemeal write RPC's.  This is different from the piecemeal 
    write optimizations we were doing before.  In this case, we are able
    to support piecemeal writes in the middle of the file that go
    into *PRELOADED* buffers.  That is, A read-merge-write case.  The original
    code attempted to do piecemeal writes without the read-before resulting
    in the partially invalid, partially dirty buffer.  Now we only allow 
    piecemeal writes to occur in fully-valid buffers.  While we could 
    theoretically discard the dirty range and simply writeback the entire
    buffer when a modification is made to part of it, we keep the dirty range
    in order to *only* write the portion of the buffer that the explicit 
    write() covered.  This is done for cache coherency reasons. 

    For example, take the situation where two different client machines
    do a seek/write to different portions of the same server-backed NFS file,
    where the two areas abut each other.  Say one client writes 2 bytes at
    seek offset 10 and the second client writes 2 bytes at seek offset 12.
    As long as the areas are not overlapping, we want this type of operation
    to work properly and not scramble the data on the server even if the
    client's idea of the state of the date is not coherent.

:2) at BAFUG 2 or 3 months ago I, *cough* attempted to keep up with you
:an Julian talking about VM issues. :)  Something you guys brought up
:was problems with mmap() + read()/write() no staying in sync and requireing
:an msync() to correctly syncronize.  I really didn't understand how this 
:could happen except recently I figured that my first question could be
:the answer.  Does this problem only happen on NFS mounted dirs?  Is it
:fixed?
:
:thanks again,
:-Alfred

    This should not be an issue any more for either UFS or NFS.  If people
    find that it is an issue, there's a bug somewhere that needs to be
    addressed.  This *was* an issue for NFS prior to the patch set.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message