Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jan 2020 03:24:10 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        Richard P Mackerras <mack63richard@gmail.com>, Adam McDougall <mcdouga9@egr.msu.edu>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: nfs lockd errors after NetApp software upgrade.
Message-ID:  <YQBPR0101MB1427FF31676F6C4C641CA933DD390@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <EE3DA2CA-9567-49F1-A71E-ABC706AA568E@cs.huji.ac.il>
References:  <EBC4AD74-EC62-4C67-AB93-1AA91F662AAC@cs.huji.ac.il> <YQBPR0101MB1427411AFE335E869B9CF022DD530@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <CAGfybS-3Rvs57=oGFEfii_9a=aWxPr6dEq1Y1LqHbLXK1ZKmXA@mail.gmail.com> <YQBPR0101MB1427F9BE658B9A46C7E08335DD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il> <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> <b1182bbf-fd0b-a23d-1cc4-ddf9513bcb2e@egr.msu.edu> <YQBPR0101MB1427CE52BBA32A888443BFB4DD2D0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <8A78F67B-C244-45CF-B9BF-D7062669B33B@cs.huji.ac.il> <YQBPR0101MB1427C9D4CF8918F10B6FD400DD2C0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <AE8F5D6B-E7DA-4AB9-B909-7D362A6A406B@cs.huji.ac.il> <YQBPR0101MB14276E7F9C127374C3E36952DD2F0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <a33ad299-9ec6-0dc9-0926-32f20cb130c5@egr.msu.edu> <CAGfybS-a6n=Pkz8iBPj7BQ3=DbFoZRFENmy2wK3B=HzHm5dVWg@mail.gmail.com> <YQBPR0101MB142781B3EF4F85A1A6ED2AE5DD290@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>, <EE3DA2CA-9567-49F1-A71E-ABC706AA568E@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
--_002_YQBPR0101MB1427FF31676F6C4C641CA933DD390YQBPR0101MB1427_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

The attached patch changes the xid to be a global for all "connections" for
the krpc UDP client.

You could try it if you'd like. It passed a trivial test, but I don't know =
why
there is that "misfeature" comment means, so I don't know if this breaks th=
at.

I can't think of why "xid" would have been per-connection (especially since=
 a
connection is a questionable concept for UDP), except that this might have
originated in a userland library and carried into the kernel during porting=
.

rick

________________________________________
From: Daniel Braniss <danny@cs.huji.ac.il>
Sent: Wednesday, January 8, 2020 12:08 PM
To: Rick Macklem
Cc: Richard P Mackerras; Adam McDougall; freebsd-stable@freebsd.org
Subject: Re: nfs lockd errors after NetApp software upgrade.

top posting NetAPP reply:
=85
Here you can see transaction ID (0x5e15f77a) being used over port 886 and t=
he NFS server successfully responds.

    4480695                2020-01-08 12:20:54       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  V4 UNLOCK Call (Reply In 4480696) FH:0x54b075a0 svid:13629 pos:0-0
    4480696                2020-01-08 12:20:54       132.65.60.56    132.65=
.116.111     NLM      0x5e15f77a (1578497914)             4045             =
  V4 UNLOCK Reply (Call In 4480695)

Here you see that 2 minutes later the client uses the same transaction ID (=
0x5e15f77a) and the same port again, but the file handle is different, so t=
he client is unlocking a different file.

    4591136                2020-01-08 12:22:54       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) FH:0xb1=
4b75a8 svid:13629 pos:0-0
    4592588                2020-01-08 12:22:57       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) FH:0xb1=
4b75a8 svid:13629 pos:0-0
    4598862                2020-01-08 12:23:03       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) FH:0xb1=
4b75a8 svid:13629 pos:0-0
    4608871                2020-01-08 12:23:21       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) FH:0xb1=
4b75a8 svid:13629 pos:0-0
    4635984                2020-01-08 12:23:59       132.65.116.111  132.65=
.60.56       NLM      0x5e15f77a (1578497914)             886              =
  [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) FH:0xb1=
4b75a8 svid:13629 pos:0-0

transaction ID reuse is also seen for a number of other transaction IDs sta=
rting at the same time.

Withing ONTAP 9.3 we have changed the way our Replay-Cache tracks requests =
by including a checksum of the RPC request. Both in in this and earlier rel=
eases ONTAP would cache the call in frame 4480695, but starintg in 9.3 we t=
hen cache the checksum as part of that.

When the client sends the request in frame 4591136 it uses the same transac=
tion ID (0x5e15f77a) and same port again. Here the problem is that we alrea=
dy hold a checksum in cache for the =93same transaction=94
 =85

this seems to be happening after the client did not receive the response an=
d re-transmits the request.

danny


On 24 Dec 2019, at 5:02, Rick Macklem <rmacklem@uoguelph.ca<mailto:rmacklem=
@uoguelph.ca>> wrote:

Richard P Mackerras wrote:
Hi,

We had some bully type workloads emerge when we moved a lot of block
storage from old XIV to new all flash 3PAR. I wonder if your IMAP issue
might have emerged just because suddenly there was the opportunity with all
flash. QOS is good on 9.x ONTAP. If anyone says it=92s not then they last
looked on 8.x. So I suggest you QOS the IMAP workload.

Nobody should be using UDP with NFS unless they have a very specific set
of circumstances. TCP was a real step forward.
Well, I can't argue with this, considering I did the first working implemen=
tation
of NFS over TCP. It was actually Mike Karels that suggested I try doing so,
There's a paper in a very old Usenix Conference Proceedings, but it is so o=
ld
that it isn't on the Usenix web page (around 1988 in Denver, if I recall). =
 I don't
even have a copy myself, although I was the author.

Now, having said that, I must note that the Network Lock Manager (NLM) and
Network Status Monitor (NSM) were not NFS. They were separate stateful
protocols (poorly designed imho) that Sun never published.

NFS as Sun designed it (NFSv2 and NFSv3) were "stateless server" protocols,
so that they could work reliably without server crash recovery.
However, the NLM was inherently stateful, since it was dealing with file lo=
cks.

So, you can't really lump the NLM with NFS (and you should avoid use of the
NLM over any transport imho).

NFSv4 tackled the difficult problem of having a "stateful server" and crash=
 recovery,
which resulted in a much more complex protocol (compare the size of RFC-181=
3
vs RFC-5661 to get some idea of this).

rick

Cheers

Richard
_______________________________________________
freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


--_002_YQBPR0101MB1427FF31676F6C4C641CA933DD390YQBPR0101MB1427_
Content-Type: application/octet-stream; name="xid.patch"
Content-Description: xid.patch
Content-Disposition: attachment; filename="xid.patch"; size=1682;
	creation-date="Thu, 09 Jan 2020 03:24:00 GMT";
	modification-date="Thu, 09 Jan 2020 03:24:00 GMT"
Content-Transfer-Encoding: base64

LS0tIHJwYy9jbG50X2RnLmMuc2F2CTIwMjAtMDEtMDggMTQ6MjA6MzQuMTkzOTkzMDAwIC0wODAw
CisrKyBycGMvY2xudF9kZy5jCTIwMjAtMDEtMDggMTQ6NDY6MDMuMjEzMzkzMDAwIC0wODAwCkBA
IC05NCw2ICs5NCw4IEBAIHN0YXRpYyBzdHJ1Y3QgY2xudF9vcHMgY2xudF9kZ19vcHMgPSB7CiAJ
LmNsX2NvbnRyb2wgPQljbG50X2RnX2NvbnRyb2wKIH07CiAKK3N0YXRpYyB2b2xhdGlsZSB1aW50
MzJfdCBycGNfeGlkID0gMDsKKwogLyoKICAqIEEgcGVuZGluZyBSUEMgcmVxdWVzdCB3aGljaCBh
d2FpdHMgYSByZXBseS4gUmVxdWVzdHMgd2hpY2ggaGF2ZQogICogcmVjZWl2ZWQgdGhlaXIgcmVw
bHkgd2lsbCBoYXZlIGNyX3hpZCBzZXQgdG8gemVybyBhbmQgY3JfbXJlcCB0bwpAQCAtMTkzLDYg
KzE5NSw3IEBAIGNsbnRfZGdfY3JlYXRlKAogCXN0cnVjdCBfX3JwY19zb2NraW5mbyBzaTsKIAlY
RFIgeGRyczsKIAlpbnQgZXJyb3I7CisJdWludDMyX3QgbmV3eGlkOwogCiAJaWYgKHN2Y2FkZHIg
PT0gTlVMTCkgewogCQlycGNfY3JlYXRlZXJyLmNmX3N0YXQgPSBSUENfVU5LTk9XTkFERFI7CkBA
IC0yNDUsOCArMjQ4LDkgQEAgY2xudF9kZ19jcmVhdGUoCiAJY3UtPmN1X3NlbnQgPSAwOwogCWN1
LT5jdV9jd25kX3dhaXQgPSBGQUxTRTsKIAkodm9pZCkgZ2V0bWljcm90aW1lKCZub3cpOwotCWN1
LT5jdV94aWQgPSBfX1JQQ19HRVRYSUQoJm5vdyk7Ci0JY2FsbF9tc2cucm1feGlkID0gY3UtPmN1
X3hpZDsKKwluZXd4aWQgPSBfX1JQQ19HRVRYSUQoJm5vdyk7CisJYXRvbWljX2NtcHNldF8zMigm
cnBjX3hpZCwgMCwgbmV3eGlkKTsKKwljYWxsX21zZy5ybV94aWQgPSBhdG9taWNfZmV0Y2hhZGRf
MzIoJnJwY194aWQsIDEpOwogCWNhbGxfbXNnLnJtX2NhbGwuY2JfcHJvZyA9IHByb2dyYW07CiAJ
Y2FsbF9tc2cucm1fY2FsbC5jYl92ZXJzID0gdmVyc2lvbjsKIAl4ZHJtZW1fY3JlYXRlKCZ4ZHJz
LCBjdS0+Y3VfbWNhbGxjLCBNQ0FMTF9NU0dfU0laRSwgWERSX0VOQ09ERSk7CkBAIC00MTgsOCAr
NDIyLDcgQEAgY2xudF9kZ19jYWxsKAogY2FsbF9hZ2FpbjoKIAltdHhfYXNzZXJ0KCZjcy0+Y3Nf
bG9jaywgTUFfT1dORUQpOwogCi0JY3UtPmN1X3hpZCsrOwotCXhpZCA9IGN1LT5jdV94aWQ7CisJ
eGlkID0gYXRvbWljX2ZldGNoYWRkXzMyKCZycGNfeGlkLCAxKTsKIAogc2VuZF9hZ2FpbjoKIAlt
dHhfdW5sb2NrKCZjcy0+Y3NfbG9jayk7CkBAIC04NjUsMTQgKzg2OCwxNiBAQCBjbG50X2RnX2Nv
bnRyb2woQ0xJRU5UICpjbCwgdV9pbnQgcmVxdWVzdCwgdm9pZCAqaW5mbykKIAkJKHZvaWQpIG1l
bWNweSgmY3UtPmN1X3JhZGRyLCBhZGRyLCBhZGRyLT5zYV9sZW4pOwogCQlicmVhazsKIAljYXNl
IENMR0VUX1hJRDoKLQkJKih1aW50MzJfdCAqKWluZm8gPSBjdS0+Y3VfeGlkOworCQkqKHVpbnQz
Ml90ICopaW5mbyA9IHJwY194aWQ7CiAJCWJyZWFrOwogCisjaWZkZWYgbm90bm93CiAJY2FzZSBD
TFNFVF9YSUQ6CiAJCS8qIFRoaXMgd2lsbCBzZXQgdGhlIHhpZCBvZiB0aGUgTkVYVCBjYWxsICov
CiAJCS8qIGRlY3JlbWVudCBieSAxIGFzIGNsbnRfZGdfY2FsbCgpIGluY3JlbWVudHMgb25jZSAq
LwogCQljdS0+Y3VfeGlkID0gKih1aW50MzJfdCAqKWluZm8gLSAxOwogCQlicmVhazsKKyNlbmRp
ZgogCiAJY2FzZSBDTEdFVF9WRVJTOgogCQkvKgo=

--_002_YQBPR0101MB1427FF31676F6C4C641CA933DD390YQBPR0101MB1427_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB1427FF31676F6C4C641CA933DD390>