Date: Wed, 6 Mar 2024 12:46:16 +0100 (CET) From: Ronald Klop <ronald-lists@klop.ws> To: Rick Macklem <rick.macklem@gmail.com> Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org Subject: Re: 13-stable NFS server hang Message-ID: <1277770972.4770.1709725576695@localhost> In-Reply-To: <CAM5tNy45ovMMrGHx0_tKUPC3in5WfjrtbBpv23k99CuxFxY21w@mail.gmail.com> References: <CAM5tNy6v3N-uiULGA0vb_2s0GK1atRh6TYNDGfYMK0PkP46BbQ@mail.gmail.com> <1020651467.1592.1709280020993@localhost> <CAM5tNy4ras1NN%2BLC7=gpyFqEefHpWCrSV-_aSyn-D6Kt8Fvw6Q@mail.gmail.com> <1608503215.4731.1709633602802@localhost> <CAM5tNy7W6tZxiTRWoyx=CafAA9SE_fgrW3mjGRY1%2BJ=89QYZ%2Bg@mail.gmail.com> <CAM5tNy45ovMMrGHx0_tKUPC3in5WfjrtbBpv23k99CuxFxY21w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_4769_153177038.1709725576674 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Van: Rick Macklem <rick.macklem@gmail.com> Datum: dinsdag, 5 maart 2024 15:43 Aan: Ronald Klop <ronald-lists@klop.ws> CC: rmacklem@freebsd.org, Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org Onderwerp: Re: 13-stable NFS server hang > > On Tue, Mar 5, 2024 at 6:34AM Rick Macklem <rick.macklem@gmail.com> wrote: > > > > On Tue, Mar 5, 2024 at 2:13AM Ronald Klop <ronald-lists@klop.ws> wrote: > > > > > > > > > Van: Rick Macklem <rick.macklem@gmail.com> > > > Datum: vrijdag, 1 maart 2024 15:23 > > > Aan: Ronald Klop <ronald-lists@klop.ws> > > > CC: Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org, rmacklem@freebsd.org > > > Onderwerp: Re: 13-stable NFS server hang > > > > > > On Fri, Mar 1, 2024 at 12:00AM Ronald Klop <ronald-lists@klop.ws> wrote: > > > > > > > > Interesting read. > > > > > > > > Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations? > > > Well, the NFS server does not really have any concept of a mount. > > > What I am referring to is the ClientID maintained for NFSv4 mounts, > > > which all the open/lock/session/layout state hangs off of. > > > > > > For most cases, this state information can safely be accessed/modified > > > via a mutex, but there are three exceptions: > > > - creating a new ClientID (which is done by the ExchangeID operation) > > > and typically happens when a NFS client does a mount. > > > - delegation Recall (which only happens when delegations are enabled) > > > One of the reasons delegations are not enabled by default on the > > > FreeBSD server. > > > - the DestroyClientID which is typically done by a NFS client during dismount. > > > For these cases, it is just too difficult to do them without sleeping. > > > As such, there is a sleep lock which the nfsd threads normally acquire shared > > > when doing NFSv4 operations, but for the above cases the lock is aquired > > > exclusive. > > > - I had to give the exclusive lock priority over shared lock > > > acquisition (it is a > > > custom locking mechanism with assorted weirdnesses) because without > > > that someone reported that new mounts took up to 1/2hr to occur. > > > (The exclusive locker waited for 30min before all the other nfsd threads > > > were not busy.) > > > Because of this priority, once a nfsd thread requests the exclusive lock, > > > all other nfsd threads executing NFSv4 RPCs block after releasing their > > > shared lock, until the exclusive locker releases the exclusive lock. > > > > > > In summary, NFSv4 has certain advantages over NFSv3, but it comes > > > with a lot of state complexity. It just is not feasible to manipulate all that > > > state with only mutex locking. > > > > > > rick > > > > > > > > > > > Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait. > > > > But the mount never needs to wait for ongoing operations before locking the structure. > > > > > > > > Just a thought in the morning > > > > > > > > Regards, > > > > Ronald. > > > > > > > > Van: Rick Macklem <rick.macklem@gmail.com> > > > > Datum: 1 maart 2024 00:31 > > > > Aan: Garrett Wollman <wollman@bimajority.org> > > > > CC: stable@freebsd.org, rmacklem@freebsd.org > > > > Onderwerp: Re: 13-stable NFS server hang > > > > > > > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote: > > > > > > > > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote: > > > > > > > > > > > > Hi, all, > > > > > > > > > > > > We've had some complaints of NFS hanging at unpredictable intervals. > > > > > > Our NFS servers are running a 13-stable from last December, and > > > > > > tonight I sat in front of the monitor watching `nfsstat -dW`. I was > > > > > > able to clearly see that there were periods when NFS activity would > > > > > > drop *instantly* from 30,000 ops/s to flat zero, which would last > > > > > > for about 25 seconds before resuming exactly as it was before. > > > > > > > > > > > > I wrote a little awk script to watch for this happening and run > > > > > > `procstat -k` on the nfsd process, and I saw that all but two of the > > > > > > service threads were idle. The three nfsd threads that had non-idle > > > > > > kstacks were: > > > > > > > > > > > > PID TID COMM TDNAME KSTACK > > > > > > 997 108481 nfsd nfsd: master mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common > > > > > > 997 960918 nfsd nfsd: service mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > > > > 997 962232 nfsd nfsd: service mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > > > > > > > > > > I'm suspicious of two things: first, the copy_file_range RPC; second, > > > > > > the "master" nfsd thread is actually servicing an RPC which requires > > > > > > obtaining a lock. The "master" getting stuck while performing client > > > > > > RPCs is, I believe, the reason NFS service grinds to a halt when a > > > > > > client tries to write into a near-full filesystem, so this problem > > > > > > would be more evidence that the dispatching function should not be > > > > > > mixed with actual operations. I don't know what the clients are > > > > > > doing, but is it possible that nfsrvd_copy_file_range is holding a > > > > > > lock that is needed by one or both of the other two threads? > > > > > > > > > > > > Near-term I could change nfsrvd_copy_file_range to just > > > > > > unconditionally return NFSERR_NOTSUP and force the clients to fall > > > > > > back, but I figured I would ask if anyone else has seen this. > > > > > I have attached a little patch that should limit the server's Copy size > > > > > to vfs.nfsd.maxcopyrange (default of 10Mbytes). > > > > > Hopefully this makes sure that the Copy does not take too long. > > > > > > > > > > You could try this instead of disabling Copy. It would be nice to know if > > > > > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.) > > > > I did a quick test without/with this patch,where I copied a 1Gbyte file. > > > > > > > > Without this patch, the Copy RPCs mostly replied in just under 1sec > > > > (which is what the flag requests), but took over 4sec for one of the Copy > > > > operations. This implies that one Read/Write of 1Mbyte on the server > > > > took over 3 seconds. > > > > I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes > > > > each and it was one of these 100Mbyte Copy operations that took over 4sec. > > > > > > > > With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes > > > > each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local > > > > mount on an old laptop, so nowhere near a server hardware config.) > > > > > > > > So, the patch might be sufficient? > > > > > > > > It would be nice to avoid disabling Copy, since it avoids reading the data > > > > into the client and then writing it back to the server. > > > > > > > > I will probably commit both patches (10Mbyte clip of Copy size and > > > > disabling Copy) to main soon, since I cannot say if clipping the size > > > > of the Copy will always be sufficient. > > > > > > > > Pleas let us know how trying these patches goes, rick > > > > > > > > > > > > > > rick > > > > > > > > > > > > > > > > > -GAWollman > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > > > > Hi Rick, > > > > > > You are much more into the NFS code than I am so excuse me if what I'm speaking about does not make sense. > > > > > > I was reading nfsrvd_compound() which calls nfsrvd_copy_file_range() via the nfsrv4_ops2 structure. > > > Nfsrvd_compound holds a lock or refcount on nfsv4rootfs_lock during the whole operation. Which is why nfsrv_setclient() is waiting in this specific case of "NFS server hang". > > > > > > But I don't see what is being modified on the nfsdstate after the IO operation ends. Or why the IO operation itself needs the lock to the nfsdstate. IMHO the in-progress IOs will happen anyway regardless of the nfsdstate. Changes to the nfsdstate during an IO operation would not affect the ongoing IO operation. > > > Wouldn't it be possible to lock the nfsv4rootfs_lock, do checks on or modify the nfsdstate as needed, unlock and then do the IO operation? That would remove a lot of the possible lock contention during (u)mount. > > > Otherwise, if we do modify the nfsdstate after the IO operation, isn't it possible to relock nfsv4rootfs_lock after the IO operation finishes? > > Well, there are a couple of reasons. Every implementation has design tradeoffs: > > 1 - A NFSv4 RPC is a compound, which can be a pretty arbitrary list of > > operations. > > As such, the NFSv4 server does not know if an open/byte range > > lock is coming > > after the operation it is currently performing, since the > > implementation does not > > pre-parse the entire compound. (I had a discussion w.r.t. > > pre-parsing with one of > > the main Linux knfsd server maintainers and he noted that he was > > not aware of > > any extant server that did pre-parse the compound. Although it > > would be useful > > for the server to have the ordered list of operations before > > commencing the RPC, > > we both agreed it was too hard to implement. > > --> It could possibly unlock/relock later, but see #2. Also, if > > relocking took a long time, > > it would result in the compound RPC taking too long (see below). > > 2 - For NFSv4.1/4.2 almost all RPCs are handled by a session. One non-arbitrary > > part of almost all NFSv4.1/4.2 RPCs is that the Sequence > > operation (the one that > > handles the session) must come first. > > Session(s) are associated with the ClientID, which means the > > ClientID and the > > session must not go away while the compound RPC is in progress. > > - This is ensured by the shared lock on the ClientID (that > > nfsv4rootfh_lock). > > Since 99.99% of operations can be done with the shared lock, I do not think > > there is a lot of contention. > > > > Although there is nothing wired down in the RFCs, there is an understanding > > in the NFSv4 community that a server should reply to an RPC in a reasonable > > time. Typically assumed to be 1-2sec. If the server does this, then a delay for > > the rare case of a new ClientID shouldn't be a big problem? > > (The is also delegation recall, which is one reason why delegations > > are not enabled > > by default.) > > > > Btw, the RFC does define an asynchronous Copy, where the operation replies > > as soon as the copy is started and the server notifies the client of completion > > later. I have not implemented this, because it introduces complexities that > > I do not want to deal with. > > For example, what happens when the server crashes/reboots while the copy > > is in progress? The file is left in a non-deterministic state, depending on what > > the client does when it does not receive the completion notify. > > > Oh, I should also note that the "shared lock" is actually called a > reference count > in the code and is there to ensure that the ClientID/Session does not go away > during execution of the compound. > > The problem in this case (which I should revisit) was that I could not figure > out how to safely add a new ClientID while other nfsd threads were in progress > performing other RPCs. Due to retries etc, there might be another RPC > in progress > using the ClientID. > > One thing to note here is that the identity of the ClientID > is not known until the Sequence operation has been performed. (And there is > cruft for NFSv4.0, since it does not have a Sequence operation.) > As such, the RPC must be in progress before it is known. > > > rick > > > > > > I hope this makes any sense and thanks for all your work on the NFS code. > > > > > > Regards, > > > Ronald. > > > > > > > > > > > > Hi Rick, Thanks for the elaborate answer. Would it make sense to have the current RPC/compound have a lock on its ClientID/session, but not on the whole nfsdstate (nfsv4rootfs_lock)? So concurrent requests like a new mount creating a new ClientID can go on in parallel, but removing or modifying the locked ClientID will wait for the lock. Or am I thinking too simple still? Regards, Ronald. ------=_Part_4769_153177038.1709725576674 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html><head></head><body><br> <p><strong>Van:</strong> Rick Macklem <rick.macklem@gmail.com><br> <strong>Datum:</strong> dinsdag, 5 maart 2024 15:43<br> <strong>Aan:</strong> Ronald Klop <ronald-lists@klop.ws><br> <strong>CC:</strong> rmacklem@freebsd.org, Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org<br> <strong>Onderwerp:</strong> Re: 13-stable NFS server hang</p> <blockquote style="padding-right: 0px; padding-left: 5px; margin-left: 5px; border-left: #000000 2px solid; margin-right: 0px"> <div class="MessageRFC822Viewer" id="P"> <div class="TextPlainViewer" id="P.P">On Tue, Mar 5, 2024 at 6:34AM Rick Macklem <rick.macklem@gmail.com> wrote:<br> ><br> > On Tue, Mar 5, 2024 at 2:13AM Ronald Klop <ronald-lists@klop.ws> wrote:<br> > ><br> > ><br> > > Van: Rick Macklem <rick.macklem@gmail.com><br> > > Datum: vrijdag, 1 maart 2024 15:23<br> > > Aan: Ronald Klop <ronald-lists@klop.ws><br> > > CC: Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org, rmacklem@freebsd.org<br> > > Onderwerp: Re: 13-stable NFS server hang<br> > ><br> > > On Fri, Mar 1, 2024 at 12:00AM Ronald Klop <ronald-lists@klop.ws> wrote:<br> > > ><br> > > > Interesting read.<br> > > ><br> > > > Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations?<br> > > Well, the NFS server does not really have any concept of a mount.<br> > > What I am referring to is the ClientID maintained for NFSv4 mounts,<br> > > which all the open/lock/session/layout state hangs off of.<br> > ><br> > > For most cases, this state information can safely be accessed/modified<br> > > via a mutex, but there are three exceptions:<br> > > - creating a new ClientID (which is done by the ExchangeID operation)<br> > > and typically happens when a NFS client does a mount.<br> > > - delegation Recall (which only happens when delegations are enabled)<br> > > One of the reasons delegations are not enabled by default on the<br> > > FreeBSD server.<br> > > - the DestroyClientID which is typically done by a NFS client during dismount.<br> > > For these cases, it is just too difficult to do them without sleeping.<br> > > As such, there is a sleep lock which the nfsd threads normally acquire shared<br> > > when doing NFSv4 operations, but for the above cases the lock is aquired<br> > > exclusive.<br> > > - I had to give the exclusive lock priority over shared lock<br> > > acquisition (it is a<br> > > custom locking mechanism with assorted weirdnesses) because without<br> > > that someone reported that new mounts took up to 1/2hr to occur.<br> > > (The exclusive locker waited for 30min before all the other nfsd threads<br> > > were not busy.)<br> > > Because of this priority, once a nfsd thread requests the exclusive lock,<br> > > all other nfsd threads executing NFSv4 RPCs block after releasing their<br> > > shared lock, until the exclusive locker releases the exclusive lock.<br> > ><br> > > In summary, NFSv4 has certain advantages over NFSv3, but it comes<br> > > with a lot of state complexity. It just is not feasible to manipulate all that<br> > > state with only mutex locking.<br> > ><br> > > rick<br> > ><br> > > ><br> > > > Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait.<br> > > > But the mount never needs to wait for ongoing operations before locking the structure.<br> > > ><br> > > > Just a thought in the morning<br> > > ><br> > > > Regards,<br> > > > Ronald.<br> > > ><br> > > > Van: Rick Macklem <rick.macklem@gmail.com><br> > > > Datum: 1 maart 2024 00:31<br> > > > Aan: Garrett Wollman <wollman@bimajority.org><br> > > > CC: stable@freebsd.org, rmacklem@freebsd.org<br> > > > Onderwerp: Re: 13-stable NFS server hang<br> > > ><br> > > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:<br> > > > ><br> > > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:<br> > > > > ><br> > > > > > Hi, all,<br> > > > > ><br> > > > > > We've had some complaints of NFS hanging at unpredictable intervals.<br> > > > > > Our NFS servers are running a 13-stable from last December, and<br> > > > > > tonight I sat in front of the monitor watching `nfsstat -dW`. I was<br> > > > > > able to clearly see that there were periods when NFS activity would<br> > > > > > drop *instantly* from 30,000 ops/s to flat zero, which would last<br> > > > > > for about 25 seconds before resuming exactly as it was before.<br> > > > > ><br> > > > > > I wrote a little awk script to watch for this happening and run<br> > > > > > `procstat -k` on the nfsd process, and I saw that all but two of the<br> > > > > > service threads were idle. The three nfsd threads that had non-idle<br> > > > > > kstacks were:<br> > > > > ><br> > > > > > PID TID COMM TDNAME KSTACK<br> > > > > > 997 108481 nfsd nfsd: master mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common<br> > > > > > 997 960918 nfsd nfsd: service mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline<br> > > > > > 997 962232 nfsd nfsd: service mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline<br> > > > > ><br> > > > > > I'm suspicious of two things: first, the copy_file_range RPC; second,<br> > > > > > the "master" nfsd thread is actually servicing an RPC which requires<br> > > > > > obtaining a lock. The "master" getting stuck while performing client<br> > > > > > RPCs is, I believe, the reason NFS service grinds to a halt when a<br> > > > > > client tries to write into a near-full filesystem, so this problem<br> > > > > > would be more evidence that the dispatching function should not be<br> > > > > > mixed with actual operations. I don't know what the clients are<br> > > > > > doing, but is it possible that nfsrvd_copy_file_range is holding a<br> > > > > > lock that is needed by one or both of the other two threads?<br> > > > > ><br> > > > > > Near-term I could change nfsrvd_copy_file_range to just<br> > > > > > unconditionally return NFSERR_NOTSUP and force the clients to fall<br> > > > > > back, but I figured I would ask if anyone else has seen this.<br> > > > > I have attached a little patch that should limit the server's Copy size<br> > > > > to vfs.nfsd.maxcopyrange (default of 10Mbytes).<br> > > > > Hopefully this makes sure that the Copy does not take too long.<br> > > > ><br> > > > > You could try this instead of disabling Copy. It would be nice to know if<br> > > > > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)<br> > > > I did a quick test without/with this patch,where I copied a 1Gbyte file.<br> > > ><br> > > > Without this patch, the Copy RPCs mostly replied in just under 1sec<br> > > > (which is what the flag requests), but took over 4sec for one of the Copy<br> > > > operations. This implies that one Read/Write of 1Mbyte on the server<br> > > > took over 3 seconds.<br> > > > I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes<br> > > > each and it was one of these 100Mbyte Copy operations that took over 4sec.<br> > > ><br> > > > With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes<br> > > > each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local<br> > > > mount on an old laptop, so nowhere near a server hardware config.)<br> > > ><br> > > > So, the patch might be sufficient?<br> > > ><br> > > > It would be nice to avoid disabling Copy, since it avoids reading the data<br> > > > into the client and then writing it back to the server.<br> > > ><br> > > > I will probably commit both patches (10Mbyte clip of Copy size and<br> > > > disabling Copy) to main soon, since I cannot say if clipping the size<br> > > > of the Copy will always be sufficient.<br> > > ><br> > > > Pleas let us know how trying these patches goes, rick<br> > > ><br> > > > ><br> > > > > rick<br> > > > ><br> > > > > ><br> > > > > > -GAWollman<br> > > > > ><br> > > > > ><br> > > ><br> > > > ________________________________<br> > > ><br> > > ><br> > > ><br> > > ><br> > > ________________________________<br> > ><br> > ><br> > ><br> > > Hi Rick,<br> > ><br> > > You are much more into the NFS code than I am so excuse me if what I'm speaking about does not make sense.<br> > ><br> > > I was reading nfsrvd_compound() which calls nfsrvd_copy_file_range() via the nfsrv4_ops2 structure.<br> > > Nfsrvd_compound holds a lock or refcount on nfsv4rootfs_lock during the whole operation. Which is why nfsrv_setclient() is waiting in this specific case of "NFS server hang".<br> > ><br> > > But I don't see what is being modified on the nfsdstate after the IO operation ends. Or why the IO operation itself needs the lock to the nfsdstate. IMHO the in-progress IOs will happen anyway regardless of the nfsdstate. Changes to the nfsdstate during an IO operation would not affect the ongoing IO operation.<br> > > Wouldn't it be possible to lock the nfsv4rootfs_lock, do checks on or modify the nfsdstate as needed, unlock and then do the IO operation? That would remove a lot of the possible lock contention during (u)mount.<br> > > Otherwise, if we do modify the nfsdstate after the IO operation, isn't it possible to relock nfsv4rootfs_lock after the IO operation finishes?<br> > Well, there are a couple of reasons. Every implementation has design tradeoffs:<br> > 1 - A NFSv4 RPC is a compound, which can be a pretty arbitrary list of<br> > operations.<br> > As such, the NFSv4 server does not know if an open/byte range<br> > lock is coming<br> > after the operation it is currently performing, since the<br> > implementation does not<br> > pre-parse the entire compound. (I had a discussion w.r.t.<br> > pre-parsing with one of<br> > the main Linux knfsd server maintainers and he noted that he was<br> > not aware of<br> > any extant server that did pre-parse the compound. Although it<br> > would be useful<br> > for the server to have the ordered list of operations before<br> > commencing the RPC,<br> > we both agreed it was too hard to implement.<br> > --> It could possibly unlock/relock later, but see #2. Also, if<br> > relocking took a long time,<br> > it would result in the compound RPC taking too long (see below).<br> > 2 - For NFSv4.1/4.2 almost all RPCs are handled by a session. One non-arbitrary<br> > part of almost all NFSv4.1/4.2 RPCs is that the Sequence<br> > operation (the one that<br> > handles the session) must come first.<br> > Session(s) are associated with the ClientID, which means the<br> > ClientID and the<br> > session must not go away while the compound RPC is in progress.<br> > - This is ensured by the shared lock on the ClientID (that<br> > nfsv4rootfh_lock).<br> > Since 99.99% of operations can be done with the shared lock, I do not think<br> > there is a lot of contention.<br> ><br> > Although there is nothing wired down in the RFCs, there is an understanding<br> > in the NFSv4 community that a server should reply to an RPC in a reasonable<br> > time. Typically assumed to be 1-2sec. If the server does this, then a delay for<br> > the rare case of a new ClientID shouldn't be a big problem?<br> > (The is also delegation recall, which is one reason why delegations<br> > are not enabled<br> > by default.)<br> ><br> > Btw, the RFC does define an asynchronous Copy, where the operation replies<br> > as soon as the copy is started and the server notifies the client of completion<br> > later. I have not implemented this, because it introduces complexities that<br> > I do not want to deal with.<br> > For example, what happens when the server crashes/reboots while the copy<br> > is in progress? The file is left in a non-deterministic state, depending on what<br> > the client does when it does not receive the completion notify.<br> ><br> Oh, I should also note that the "shared lock" is actually called a<br> reference count<br> in the code and is there to ensure that the ClientID/Session does not go away<br> during execution of the compound.<br> <br> The problem in this case (which I should revisit) was that I could not figure<br> out how to safely add a new ClientID while other nfsd threads were in progress<br> performing other RPCs. Due to retries etc, there might be another RPC<br> in progress<br> using the ClientID.<br> <br> One thing to note here is that the identity of the ClientID<br> is not known until the Sequence operation has been performed. (And there is<br> cruft for NFSv4.0, since it does not have a Sequence operation.)<br> As such, the RPC must be in progress before it is known.<br> <br> > rick<br> > ><br> > > I hope this makes any sense and thanks for all your work on the NFS code.<br> > ><br> > > Regards,<br> > > Ronald.<br> > ><br> > ><br> > ></div> <hr></div> </blockquote> <br> <br> Hi Rick,<br> <br> Thanks for the elaborate answer.<br> <br> Would it make sense to have the current RPC/compound have a lock on its ClientID/session, but not on the whole nfsdstate (nfsv4rootfs_lock)?<br> <br> So concurrent requests like a new mount creating a new ClientID can go on in parallel, but removing or modifying the locked ClientID will wait for the lock.<br> <br> Or am I thinking too simple still?<br> <br> Regards,<br> Ronald.<br> </body></html> ------=_Part_4769_153177038.1709725576674--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1277770972.4770.1709725576695>