Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 May 2009 11:07:53 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Doug Rabson <dfr@rabson.org>
Cc:        svn-src-head@FreeBSD.org, Rick Macklem <rmacklem@FreeBSD.org>, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org
Subject:   Re: svn commit: r192256 - head/sys/fs/nfsserver
Message-ID:  <Pine.GSO.4.63.0905201020520.14017@muncher.cs.uoguelph.ca>
In-Reply-To: <E45903EF-81D4-482D-88B1-B763440D8962@rabson.org>
References:  <200905171933.n4HJXmC0037587@svn.freebsd.org> <8ECF61A0-AFE1-4320-B0AA-2216C268A921@rabson.org> <E45903EF-81D4-482D-88B1-B763440D8962@rabson.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On Wed, 20 May 2009, Doug Rabson wrote:

> Thinking about this for a few more minutes, I think you probably want to 
> allocate a sysid for each client and then for each lock_owner of that client 
> allocate a 'pid'. The value doesn't have to be a process identifier but it 
> does have to allow different lock owners from the same client to be 
> distinguished.
>
Why do they need to be distinguished? The nfsv4 state subsystem handles
all conflicts between them, so they are just "nfsv4 locks".

An nfsv4 lockowner is a ClientID + up to 1024 bytes of opaque name and it
might not persist in the server beyond the point where no locks are
held and the associated OpenOwner no longer has any Opens. After this,
the same lockowner could be "re-incarnated" (ie. create a new state
data structure in the server with the same ClientID + up to 1024 bytes)
when the client chooses to do more locking on it. If a pid is generated
sequentially, this second re-incarnation would end up with a different
pid although it is the same lockowner. (To ensure this doesn't happen,
the server would have to hold onto the lockowner state structure "forever"
and that obviously isn't practical.) Or a pid could be a 32bit checksum
on the ClientID + up to 1024 bytes instead of sequential assignment. In 
that case the re-incarnation would get the same pid, but it wouldn't be
guaranteed to be unique across all different lockowners.

As such, the most an assigned pid could be is a "hint" that the lockowner
is different/same. Is there some benefit to this over "held by nfsv4",
which is what using one <l_sysid, l_pid> tuple gives you?

> You probably also want to record locks in the local lock manager on the 
> client. In NLM, I use a different range of sysids starting at 0x100000 for 
> this. This lets you do lock recovery after a server restart by asking the 
> local lock manager to enumerate locks for the right sysid.
>
The lock state all lives in the nfsv4 client (some associated with a
delegation and assigned locally, the rest tied to an associated Open) with 
the "up to 1024 byte" owner names generated by the client, etc. 
Maintaining the rather complex relationship between Opens (with 
Openowners) and their associated locks is probably the "most interesting"
part of implementing nfsv4. Then you must recover the locks, maintaining
that relationship. (The relationship is established via the 
open_to_lock_owner case of the Lock Op.) The recovery code uses client 
side data structures that reflect the open/lock relationship. The lock
manager wouldn't be able to provide that information and I think there
would be little gained by trying to make the major modifications that
would be required so that it could do so.

> On 20 May 2009, at 09:26, Doug Rabson wrote:
>
>> This is incorrect. A sysid of zero is reserved for local locks on local 
>> filesystems. You need to allocate a sysid when the client is created and it 
>> needs to not conflict with the sysids used by NLM. I suggest adding a 
>> function to nlm_prot_impl.c to return the next available sysid (and bump 
>> the counter).
>> 
Ok, when I looked at the code, l_sysid only seemed to be used when 
F_REMOTE is set, so I didn't realize that l_sysid == 0 was reserved
for local locks. I'll look at what you suggest and send you an nlm
patch for review.

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0905201020520.14017>