Date: Fri, 12 Sep 1997 22:09:35 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: current@freebsd.org Subject: NFS client locking Message-ID: <199709122209.PAA29030@usr08.primenet.com>
next in thread | raw e-mail | index | archive | help
I would like to discuss NFS client locking implementation details. This is document constitutes a design rationale. NFS client locking requires the proxying of locks across the network. I believe NFS client locking needs to assert the lock locally, first, then if that is successful, assert it against the server, and if that is successful, return success. If the lock conflicts with a local process that also has the file open, the lock will be denied without generating wire traffic. If the lock conflicts with a process on another machine's lock, then the remote lock request will be denied. If the lock is denied by the remote machine, the local lock must be deasserted. Deasserting the local lock is fraught with peril. If the local lock has been coelesced, it may have upgraded or downgraded the locak during the coelesce. If the local lock is coelesced, it may have overlapped with other locks. If a an overlapping or upgraded or downgraded lock region is removed, then the previous lock, which was legitimately granted, will be destroyed. Therefore, to assert a lock, the client machine must: IF local_assert_uncoelesced_lock() == FAIL return FAIL ELSE IF remote_assert_lock() == FAIL local_deassert_uncoelesced_lock() return FAIL ENDIF local_coaelesce_lock() return SUCCESS ENDIF To deassert a lock, it must: local_decoelesce_lock() IF remote_deassert_lock() == FAIL local_coaelesce_lock() return FAIL ENDIF local_deassert_uncoelesced_lock() return SUCCESS In other words, delayed coelescing and delayed deletion. In order to implement this, the common locking code must move out of the per FS VOP_ADVLOCK() and into the system calls/VFS framework layer. In order to move the common locking code up, the access to per FS data structures must be removed. Specifically, the lock list must be removed from the inode and paced into the vnode, which is a filesystem independent opaque object. In both cases, the common locking code must respect uncoelesced locks as if they had been coelesced. It must examine both. In the case of a lock demotion or promotion, both the uncoelesced and coelesced locks must be respected, and the higher restriction enforced; this is equivalent to the conflicting requestor coming in either before a lock demotion or after a lock promotion, either of which cases it must be capable of handling anyway. In order to handle the intial "ELSE" case in the pseudocode, the remote_assert_lock() (implemented vy the NFS spcific VOP_ADVLOCK() call), must perform the proxy on behalf of the local system. For this to work properly, the VOP_ADVLOCK() function must provide a veto-based interface. That is, it must support the idea of returning "request allowed" or "request denied" to the common locking code. For NFS, this result will be the server acceptance or refusal of the proxied lock request. For all local FS's, this means VOP_ADVLOCK() will simply return "true", with the exception of multiplexing FS layers. For multiplexing FS layers (which combine more than one FS), the multiplexing layer must reconcile failures. Consider the cause of a union mount of two NFS filesystems. When the lock request is made to the union FS, the union FS must make a request for each underlying FS in the union. This is, effectively: for( fsp = fs1; fsp != NULL; fsp = fsp->next) { IF fsp->VOP_ADVLOCK(assert) == FAIL for( fsp2 = fs1; fsp2 != fsp; fsp2 = fsp2->next) { fsp2->VOP_ADVLOCK(deassert) } return FAIL ENDIF } return SUCCESS (Yes, union FS's are not a linked list; this is pseudo-code). It seems that these changes are required for NFS client locking to operate. Further benefits: While the NFS client locking code is under developement, local clients locks will be enforced against each other, even if they are not enforced against each other. Moving the locking code out of the per FS code mans only one locking implementation needs to be debugged, instead of one per FS type. Moving the locking code to a common area means less total code which means less things to potentially go wrong. Moving to a veto based interface simplifies the FS code, and the code necessary to implement an entirely new FS. The change is in keeping with the spirit of the Heidemann thesis on which the existing code is modelled. The code is already implemented, an in the FreeBSD core team's (Doug Rabson's) posession. Comments? Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709122209.PAA29030>