Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jan 2020 19:08:07 +0200
From:      Daniel Braniss <danny@cs.huji.ac.il>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Richard P Mackerras <mack63richard@gmail.com>, Adam McDougall <mcdouga9@egr.msu.edu>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: nfs lockd errors after NetApp software upgrade.
Message-ID:  <EE3DA2CA-9567-49F1-A71E-ABC706AA568E@cs.huji.ac.il>
In-Reply-To: <YQBPR0101MB142781B3EF4F85A1A6ED2AE5DD290@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>
References:  <EBC4AD74-EC62-4C67-AB93-1AA91F662AAC@cs.huji.ac.il> <YQBPR0101MB1427411AFE335E869B9CF022DD530@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <CAGfybS-3Rvs57=oGFEfii_9a=aWxPr6dEq1Y1LqHbLXK1ZKmXA@mail.gmail.com> <YQBPR0101MB1427F9BE658B9A46C7E08335DD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il> <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> <b1182bbf-fd0b-a23d-1cc4-ddf9513bcb2e@egr.msu.edu> <YQBPR0101MB1427CE52BBA32A888443BFB4DD2D0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <8A78F67B-C244-45CF-B9BF-D7062669B33B@cs.huji.ac.il> <YQBPR0101MB1427C9D4CF8918F10B6FD400DD2C0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <AE8F5D6B-E7DA-4AB9-B909-7D362A6A406B@cs.huji.ac.il> <YQBPR0101MB14276E7F9C127374C3E36952DD2F0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <a33ad299-9ec6-0dc9-0926-32f20cb130c5@egr.msu.edu> <CAGfybS-a6n=Pkz8iBPj7BQ3=DbFoZRFENmy2wK3B=HzHm5dVWg@mail.gmail.com> <YQBPR0101MB142781B3EF4F85A1A6ED2AE5DD290@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
top posting NetAPP reply:
=E2=80=A6
Here you can see transaction ID (0x5e15f77a) being used over port 886 =
and the NFS server successfully responds.
=20
    4480695                2020-01-08 12:20:54       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          V4 UNLOCK Call (Reply In 4480696) FH:0x54b075a0 svid:13629 =
pos:0-0
    4480696                2020-01-08 12:20:54       132.65.60.56    =
132.65.116.111     NLM      0x5e15f77a (1578497914)             4045     =
          V4 UNLOCK Reply (Call In 4480695)
=20
Here you see that 2 minutes later the client uses the same transaction =
ID (0x5e15f77a) and the same port again, but the file handle is =
different, so the client is unlocking a different file.
=20
    4591136                2020-01-08 12:22:54       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In =
4480696) FH:0xb14b75a8 svid:13629 pos:0-0
    4592588                2020-01-08 12:22:57       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In =
4480696) FH:0xb14b75a8 svid:13629 pos:0-0
    4598862                2020-01-08 12:23:03       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In =
4480696) FH:0xb14b75a8 svid:13629 pos:0-0
    4608871                2020-01-08 12:23:21       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In =
4480696) FH:0xb14b75a8 svid:13629 pos:0-0
    4635984                2020-01-08 12:23:59       132.65.116.111  =
132.65.60.56       NLM      0x5e15f77a (1578497914)             886      =
          [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In =
4480696) FH:0xb14b75a8 svid:13629 pos:0-0
=20
transaction ID reuse is also seen for a number of other transaction IDs =
starting at the same time.
=20
Withing ONTAP 9.3 we have changed the way our Replay-Cache tracks =
requests by including a checksum of the RPC request. Both in in this and =
earlier releases ONTAP would cache the call in frame 4480695, but =
starintg in 9.3 we then cache the checksum as part of that.
=20
When the client sends the request in frame 4591136 it uses the same =
transaction ID (0x5e15f77a) and same port again. Here the problem is =
that we already hold a checksum in cache for the =E2=80=9Csame =
transaction=E2=80=9D
 =E2=80=A6

this seems to be happening after the client did not receive the response =
and re-transmits the request.

danny


> On 24 Dec 2019, at 5:02, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
> Richard P Mackerras wrote:
>> Hi,
>>=20
>> We had some bully type workloads emerge when we moved a lot of block
>> storage from old XIV to new all flash 3PAR. I wonder if your IMAP =
issue
>> might have emerged just because suddenly there was the opportunity =
with all
>> flash. QOS is good on 9.x ONTAP. If anyone says it=E2=80=99s not then =
they last
>> looked on 8.x. So I suggest you QOS the IMAP workload.
>>=20
>> Nobody should be using UDP with NFS unless they have a very specific =
set
>> of circumstances. TCP was a real step forward.
> Well, I can't argue with this, considering I did the first working =
implementation
> of NFS over TCP. It was actually Mike Karels that suggested I try =
doing so,
> There's a paper in a very old Usenix Conference Proceedings, but it is =
so old
> that it isn't on the Usenix web page (around 1988 in Denver, if I =
recall).  I don't
> even have a copy myself, although I was the author.
>=20
> Now, having said that, I must note that the Network Lock Manager (NLM) =
and
> Network Status Monitor (NSM) were not NFS. They were separate stateful
> protocols (poorly designed imho) that Sun never published.
>=20
> NFS as Sun designed it (NFSv2 and NFSv3) were "stateless server" =
protocols,
> so that they could work reliably without server crash recovery.
> However, the NLM was inherently stateful, since it was dealing with =
file locks.
>=20
> So, you can't really lump the NLM with NFS (and you should avoid use =
of the
> NLM over any transport imho).
>=20
> NFSv4 tackled the difficult problem of having a "stateful server" and =
crash recovery,
> which resulted in a much more complex protocol (compare the size of =
RFC-1813
> vs RFC-5661 to get some idea of this).
>=20
> rick
>=20
> Cheers
>=20
> Richard
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EE3DA2CA-9567-49F1-A71E-ABC706AA568E>