From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 08:05:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 80A50106564A for ; Fri, 21 Sep 2012 08:05:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 83B718FC12 for ; Fri, 21 Sep 2012 08:05:27 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8L85Thp099520; Fri, 21 Sep 2012 11:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8L85Gae022597; Fri, 21 Sep 2012 11:05:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8L85GtC022596; Fri, 21 Sep 2012 11:05:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 21 Sep 2012 11:05:16 +0300 From: Konstantin Belousov To: Rick Macklem Message-ID: <20120921080516.GC37286@deviant.kiev.zoral.com.ua> References: <20120919061659.GS37286@deviant.kiev.zoral.com.ua> <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7iqzzmjEMnnYslDD" Content-Disposition: inline In-Reply-To: <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 08:05:29 -0000 --7iqzzmjEMnnYslDD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > > > Konstantin Belousov wrote: > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > > > Konstantin Belousov wrote: > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > > > > > Hi, > > > > > > > > > > > > > > There is a simple patch at: > > > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > > > that can be applied to a kernel + mountd, so that the new > > > > > > > nfsd can be suspended by mountd while the exports are being > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > > > > > (This avoids the long standing bug where clients receive > > > > > > > ESTALE > > > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > > > > > This looks simple, but also somewhat worrisome. What would > > > > > > happen > > > > > > if the mountd crashes after nfsd suspension is requested, but > > > > > > before > > > > > > resume was performed ? > > > > > > > > > > > > Might be, mountd should check for suspended nfsd on start and > > > > > > unsuspend > > > > > > it, if some flag is specified ? > > > > > Well, I think that happens with the patch as it stands. > > > > > > > > > > suspend is done if the "-S" option is specified, but that is a > > > > > no op > > > > > if it is already suspended. The resume is done no matter what > > > > > flags > > > > > are provided, so mountd will always try and do a "resume". > > > > > --> get_exportlist() is always called when mountd is started up > > > > > and > > > > > it does the resume unconditionally when it completes. > > > > > If mountd repeatedly crashes before completing > > > > > get_exportlist() > > > > > when it is started up, the exports will be all messed up, so > > > > > having the nfsd threads suspended doesn't seem so bad for > > > > > this > > > > > case (which hopefully never happens;-). > > > > > > > > > > Both suspend and resume are just no ops for unpatched kernels. > > > > > > > > > > Maybe the comment in front of "resume" should explicitly explain > > > > > this, instead of saying resume is harmless to do under all > > > > > conditions? > > > > > > > > > > Thanks for looking at it, rick > > > > I see. > > > > > > > > My another note is that there is no any protection against > > > > parallel > > > > instances of suspend/resume happen. For instance, one thread could > > > > set > > > > suspend_nfsd =3D 1 and be descheduled, while another executes resume > > > > code sequence meantime. Then it would see suspend_nfsd !=3D 0, while > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that > > > > nfsv4_unlock would silently exit. The suspending thread resumes, > > > > and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock > > > > held. > > > Yes. I had assumed that mountd would be the only thing using these > > > syscalls > > > and it is single threaded. (The syscalls can only be done by root > > > for the > > > obvious reasons.;-) > > > > > > Maybe the following untested version of the syscalls would be > > > better, since > > > they would allow multiple concurrent calls to either suspend or > > > resume. > > > (There would still be an indeterminate case if one thread called > > > resume > > > concurrently with another few calling suspend, but that is > > > unavoidable, > > > I think?) > > > > > > Again, thanks for the comments, rick > > > --- untested version of syscalls --- > > > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) !=3D 0) { > > > NFSLOCKV4ROOTMUTEX(); > > > if (suspend_nfsd =3D=3D 0) { > > > /* Lock out all nfsd threads */ > > > igotlock =3D 0; > > > while (igotlock =3D=3D 0 && suspend_nfsd =3D=3D 0) { > > > igotlock =3D nfsv4_lock(&nfsv4rootfs_lock, 1, > > > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > > > } > > > suspend_nfsd =3D 1; > > > } > > > NFSUNLOCKV4ROOTMUTEX(); > > > error =3D 0; > > > } else if ((uap->flag & NFSSVC_RESUMENFSD) !=3D 0) { > > > NFSLOCKV4ROOTMUTEX(); > > > if (suspend_nfsd !=3D 0) { > > > nfsv4_unlock(&nfsv4rootfs_lock, 0); > > > suspend_nfsd =3D 0; > > > } > > > NFSUNLOCKV4ROOTMUTEX(); > > > error =3D 0; > > > } > >=20 > > From the cursory look, this variant is an improvement, mostly by > > taking > > the interlock before testing suspend_nfsd, and using the while loop. > >=20 > > Is it possible to also make the sleep for the lock interruptible ? > > So that blocked mountd could be killed by a signal ? > Well, it would require some coding. An extra argument to nfsv4_lock() > to indicate to do so and then either the caller would have to check > for a pending termination signal when it returns 0 (indicates didn't get > lock) or a new return value to indicate EINTR. The latter would require > all the calls to it to be changed to recognize the new 3rd return case. > Because there are a lot of these calls, I'd tend towards just having the > caller check for a pending signal. >=20 > Not sure if it would make much difference though. The only time it > would get stuck in nfsv4_lock() is if the nfsd threads are all wedged > and in that case having mountd wedged too probably doesn't make much > difference, since the NFS service is toast in that case anyhow. >=20 > If you think it is worth doing, I can add that. I basically see this > as a "stop-gap" fix until such time as something like nfse is done, > but since I haven't the time to look at nfse right now, I have no > idea when/if that might happen. Ok, please go ahead with the patch. Having the patch even in its current form is obviously better then not to have it. If the wedged mountd appears to be annoying enough for me, I would do the change. Thanks. --7iqzzmjEMnnYslDD Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBcH7wACgkQC3+MBN1Mb4hrPwCdH6HrPJL/FeYl2hofEkPEB299 ISQAn2OuFrZuC0lpmL/lFF1xen2APSs1 =UIog -----END PGP SIGNATURE----- --7iqzzmjEMnnYslDD--