Date: Sat, 1 Sep 2012 19:57:50 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: =?utf-8?Q?Attila_Bog=C3=A1r?= <attila.bogar@linguamatics.com> Cc: freebsd-fs@FreeBSD.org Subject: Re: NFS: rpcsec_gss with Linux clients Message-ID: <817398955.1415204.1346543870350.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <5040DABD.20001@linguamatics.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_1415203_641086622.1346543870347 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Attila Bogar wrote: > Hi, > > In the wireshark trace I see, that during an NFS mount, Linux opens > two > TCP connections. > Linux creates the GSS conect on one tcp connection, sends a DESTROY > destroys rpcsec, > but immediately (without waiting for the DESTROY reply) - reusing the > context on the other TCP connection. > > I don't know who is guilty the BSD or the Linux (or both) as I haven't > spent time reading the RFCs. > This certainly sounds bogus. I can see an argument for 2 TCP connections for trunking, but since a security context should only be destroyed when the client is done with it, doing a DESTROY doesn't make sense? (There is something in the RPC header called a "handle". It identifies the security context, and it would be nice to check the wireshark trace to see if it the same as the one being used on the other connection?) > This is very difficult to reproduce if the server is very fast. You > have to use an extremely fast client. > With a Linux virtual machine I couldn't reproduce. Even printf's in > the > bsd kernel destroy the balance and everything starts to suddenly work > because of the timing. This is a quantum bug. > > Look at /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c > > In svc_rpc_gss() > case RPCSEC_GSS_DESTROY: > > svc_rpc_gss_validate returns FALSE during the DESTROY. > > I don't quite know why, but during the destroy within the the > svc_rpc_gss_validate() gss_verify_mic() returns maj_stat = > GSS_S_DEFECTIVE_TOKEN, no matter what heimdal version I use. > That would indicate the encrypted checksum isn't correct. It might be using an algorithm only supported by the newer RPCSEC_GSS_V3? > As a consequence, client->cl_state is marked CLIENT_STALE; > For DESTROY when it will fail, I'm not sure if marking the context stale makes sense. (I can see an argument for and against doing this.) I've attached a small patch with disables setting client->cl_state to CLIENT_STALE for this case, which you could try, to see if it helps? > I think client locking should have been used at this point. > > In the meantime the next TCP connection's nfs PUTROOTFH request is > being > processed in the kernel. > > And this is the point where the problem may or may not happen. > In svc_rpc_gss() at the beginning svc_rpc_gss_timeout_clients() is > called. > If it's called before svc_rpc_gss_validate() marked the cl_state > CLIENT_STALE and the Linux client survived. > > Here is my patch for review. This is my first ever kernel patch. > > I'm going to open a PR... > I'd suggest contacting the Linux folks first and see if they are willing to look at the wireshark trace or know of an issue/fix, because it really sounds like a Linux client issue. > Constructive comments are welcome. > > Thanks, > Attila > > --- /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c.orig 2012-08-30 > 23:34:00.000000000 +0100 > +++ /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c 2012-08-31 > 15:59:40.000000000 +0100 > @@ -565,7 +565,8 @@ > */ > client->cl_state = CLIENT_NEW; > client->cl_locked = FALSE; > - client->cl_expiration = time_uptime + 5*60; > + /* we are now more cautious */ > + client->cl_expiration = time_uptime + 4*60; > Waiting 4 minutes instead of 5 shouldn't have any real effect, although it might avoid the problem for your case w.r.t. timing. > return (client); > } > @@ -930,7 +931,11 @@ > if (cred_lifetime == GSS_C_INDEFINITE) > cred_lifetime = time_uptime + 24*60*60; > > - client->cl_expiration = time_uptime + cred_lifetime; > + /* > + * we are now more cautious > + * 12 sec is just an adhoc hack value > + */ > + client->cl_expiration = time_uptime + cred_lifetime - 12; > This time is usually the TGT lifetime (12->24hrs), so subtracting 12 sec from it doesn't really make any sense. (I will note that the calculation of cred_lifetime for the GSS_C_INDEFINITE case looks incorrect, since time_uptime gets added twice, but I doubt that's relevant to your problem, since it is set to more than 24hrs.) > /* > * Fill in cred details in the rawcred structure. > @@ -990,7 +995,7 @@ > gss_buffer_desc rpcbuf, checksum; > OM_uint32 maj_stat, min_stat; > gss_qop_t qop_state; > - int32_t rpchdr[128 / sizeof(int32_t)]; > + int32_t rpchdr[2048 / sizeof(int32_t)]; > int32_t *buf; > > rpc_gss_log_debug("in svc_rpc_gss_validate()"); > @@ -1024,7 +1029,12 @@ > if (maj_stat != GSS_S_COMPLETE) { > rpc_gss_log_status("gss_verify_mic", client->cl_mech, > maj_stat, min_stat); > - client->cl_state = CLIENT_STALE; > + /* > + * Linux nfs-utils>=1.2.3 is re-using GSS context > + * on other TCP NFS connection after it DESTROYED it > + * The garbage collector will remove client at cl_expiration > + */ > + /* client->cl_state = CLIENT_STALE; */ > return (FALSE); > } > If this helps, please try the attached patch which does the same thing, but only for the DESTROY case. rick > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_1415203_641086622.1346543870347 Content-Type: text/x-patch; name=rpcsec-destroy.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=rpcsec-destroy.patch LS0tIHJwYy9ycGNzZWNfZ3NzL3N2Y19ycGNzZWNfZ3NzLmMuc2F2CTIwMTItMDktMDEgMTk6MjA6 MzUuMDAwMDAwMDAwIC0wNDAwCisrKyBycGMvcnBjc2VjX2dzcy9zdmNfcnBjc2VjX2dzcy5jCTIw MTItMDktMDEgMTk6MjQ6MTUuMDAwMDAwMDAwIC0wNDAwCkBAIC05ODQsNyArOTg0LDcgQEAgc3Zj X3JwY19nc3NfYWNjZXB0X3NlY19jb250ZXh0KHN0cnVjdCBzdgogCiBzdGF0aWMgYm9vbF90CiBz dmNfcnBjX2dzc192YWxpZGF0ZShzdHJ1Y3Qgc3ZjX3JwY19nc3NfY2xpZW50ICpjbGllbnQsIHN0 cnVjdCBycGNfbXNnICptc2csCi0gICAgZ3NzX3FvcF90ICpxb3ApCisgICAgZ3NzX3FvcF90ICpx b3AsIHJwY19nc3NfcHJvY190IGdjcHJvYykKIHsKIAlzdHJ1Y3Qgb3BhcXVlX2F1dGgJKm9hOwog CWdzc19idWZmZXJfZGVzYwkJIHJwY2J1ZiwgY2hlY2tzdW07CkBAIC0xMDI0LDcgKzEwMjQsOCBA QCBzdmNfcnBjX2dzc192YWxpZGF0ZShzdHJ1Y3Qgc3ZjX3JwY19nc3NfCiAJaWYgKG1hal9zdGF0 ICE9IEdTU19TX0NPTVBMRVRFKSB7CiAJCXJwY19nc3NfbG9nX3N0YXR1cygiZ3NzX3ZlcmlmeV9t aWMiLCBjbGllbnQtPmNsX21lY2gsCiAJCSAgICBtYWpfc3RhdCwgbWluX3N0YXQpOwotCQljbGll bnQtPmNsX3N0YXRlID0gQ0xJRU5UX1NUQUxFOworCQlpZiAoZ2Nwcm9jICE9IFJQQ1NFQ19HU1Nf REVTVFJPWSkKKwkJCWNsaWVudC0+Y2xfc3RhdGUgPSBDTElFTlRfU1RBTEU7CiAJCXJldHVybiAo RkFMU0UpOwogCX0KIApAQCAtMTM1OCw3ICsxMzU5LDcgQEAgc3ZjX3JwY19nc3Moc3RydWN0IHN2 Y19yZXEgKnJxc3QsIHN0cnVjdAogCQkJYnJlYWs7CiAJCX0KIAotCQlpZiAoIXN2Y19ycGNfZ3Nz X3ZhbGlkYXRlKGNsaWVudCwgbXNnLCAmcW9wKSkgeworCQlpZiAoIXN2Y19ycGNfZ3NzX3ZhbGlk YXRlKGNsaWVudCwgbXNnLCAmcW9wLCBnYy5nY19wcm9jKSkgewogCQkJcmVzdWx0ID0gUlBDU0VD X0dTU19DUkVEUFJPQkxFTTsKIAkJCWJyZWFrOwogCQl9Cg== ------=_Part_1415203_641086622.1346543870347--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?817398955.1415204.1346543870350.JavaMail.root>