Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Dec 2019 09:32:05 +0200
From:      Daniel Braniss <danny@cs.huji.ac.il>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Adam McDougall <mcdouga9@egr.msu.edu>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: nfs lockd errors after NetApp software upgrade.
Message-ID:  <8A78F67B-C244-45CF-B9BF-D7062669B33B@cs.huji.ac.il>
In-Reply-To: <YQBPR0101MB1427CE52BBA32A888443BFB4DD2D0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>
References:  <EBC4AD74-EC62-4C67-AB93-1AA91F662AAC@cs.huji.ac.il> <YQBPR0101MB1427411AFE335E869B9CF022DD530@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <CAGfybS-3Rvs57=oGFEfii_9a=aWxPr6dEq1Y1LqHbLXK1ZKmXA@mail.gmail.com> <YQBPR0101MB1427F9BE658B9A46C7E08335DD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il> <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> <b1182bbf-fd0b-a23d-1cc4-ddf9513bcb2e@egr.msu.edu> <YQBPR0101MB1427CE52BBA32A888443BFB4DD2D0@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 20 Dec 2019, at 19:19, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>=20
> Adam McDougall wrote:
>> Try changing bool_t do_tcp =3D FALSE; to TRUE in
>> /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. =
I
>> think this makes it match Linux client behavior. I suspect I ran into
>> the same issue as you. I do think I used nolockd is a workaround
>> temporarily. I can provide some more details if it works.
> If this fixes the problem, please let me know.
>=20
> I'm not sure I'd want to change the default, since it might break =
things for
> others, but I can definitely make it a tunable, so that people don't =
need to
> recompile a kernel to deal with it.
>=20

great! I was just about to see how it can be done(tunable) but need to =
check if it can be done
at any time, or just at boot time.
thanks.
btw, currently, from several hours of analysing the traffic, it seems =
that nlm is UDP.
danny


> rick
>=20
> On 12/19/19 9:21 AM, Daniel Braniss wrote:
>>=20
>>=20
>>> On 19 Dec 2019, at 16:09, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>>=20
>>> Daniel Braniss wrote:
>>> [stuff snipped]
>>>> all mounts are nfsv3/tcp
>>> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly =
don't know when
>>> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast =
at times.
>> can the replay cache have any influence here? I tend to remember way =
back issues
>> with it,
>>>=20
>>> To me, it looks like a network configuration issue.
>> that was/is my gut feelings too, but, as far as we can tell, nothing =
has changed in the network infrastructure,
>> the problems appeared after the NetAPP=E2=80=99s software was =
updated, it was working fine till then.
>>=20
>> the problems are also happening on freebsd 12.1
>>=20
>>> You could capture packets (maybe when a client first starts =
rpc.statd and rpc.lockd)
>>> and then look at them in wireshark. I'd disable statup of rpc.lockd =
and rpc.statd
>>> at boot for a test client and then run something like:
>>> # tcpdump -s 0 -s out.pcap host <netapp-host>
>>> - and then start rpc.statd and rpc.lockd
>>> Then I'd look at out.pcap in wireshark (much better at decoding this =
stuff than
>>> tcpdump). I'd look for things like different reply IP addresses from =
the Netapp,
>>> which might confuse this tired old NLM protocol Sun devised in the =
mid-1980s.
>>>=20
>> it=E2=80=99s going to be an interesting week end :-(
>>=20
>>>> the error is also appearing on freebsd-11.2-stable, I=E2=80=99m now =
checking if it=E2=80=99s also
>>>> happening on 12.1
>>>> btw, the NetApp version is 9.3P17
>>> Yes. I wasn't the author of the NSM and NLM code (long ago I refused =
to even
>>> try to implement it, because I knew the protocol was badly broken) =
and I avoid
>>> fiddling with. As such, it won't have change much since around =
FreeBSD7.
>> and we haven=E2=80=99t had any issues with it for years, so you must =
have done something good
>>=20
>> cheers,
>>      danny
>>=20
>>>=20
>>> rick
>>>=20
>>> cheers,
>>>       danny
>>>=20
>>>> rick
>>>>=20
>>>> Cheers
>>>>=20
>>>> Richard
>>>> (NetApp admin)
>>>>=20
>>>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss =
<danny@cs.huji.ac.il<mailto:danny@cs.huji.ac.il>> wrote:
>>>>=20
>>>>=20
>>>>> On 18 Dec 2019, at 16:55, Rick Macklem =
<rmacklem@uoguelph.ca<mailto:rmacklem@uoguelph.ca>> wrote:
>>>>>=20
>>>>> Daniel Braniss wrote:
>>>>>=20
>>>>>> Hi,
>>>>>> The server with the problems is running FreeBSD 11.1 stable, it =
was working fine for >several months,
>>>>>> but after a software upgrade of our NetAPP server it=E2=80=99s =
reporting many lockd errors >and becomes catatonic,
>>>>>> ...
>>>>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd =
not responding
>>>>>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>>>>>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>>>>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd =
is alive again
>>>>>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>>>>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: =
Listen queue >overflow: 194 already in queue awaiting acceptance (1 =
occurrences)
>>>>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: =
Listen queue >overflow: 193 already in queue awaiting acceptance (3957 =
occurrences)
>>>>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: =
Listen queue >overflow: 193 already in queue awaiting acceptance =E2=80=A6=

>>>>> Seems like their software upgrade didn't improve handling of NLM =
RPCs?
>>>>> Appears to be handling RPCs slowly and/or intermittently. Note =
that no one
>>>>> tests it with IPv6, so at least make sure you are still using IPv4 =
for the mounts and
>>>>> try and make sure IP broadcast works between client and Netapp. I =
think the NLM
>>>>> and NSM (rpc.statd) still use IP broadcast sometimes.
>>>>>=20
>>>> we are ipv4 - we have our own class c :-)
>>>>> Maybe the network guys can suggest more w.r.t. why, but as I've =
stated before,
>>>>> the NLM is a fundamentally broken protocol which was never =
published by Sun,
>>>>> so I suggest you avoid using it if at all possible.
>>>> well, at the moment the ball is on NetAPP court, and switching to =
NFSv4 at the moment is out of the question, it=E2=80=99s
>>>> a production server used by several thousand students.
>>>>=20
>>>>>=20
>>>>> - If the locks don't need to be seen by other clients, you can =
just use the "nolockd"
>>>>> mount option.
>>>>> or
>>>>> - If locks need to be seen by other clients, try NFSv4 mounts. =
Netapp filers
>>>>> should support NFSv4.1, which is a much better protocol that =
NFSv4.0.
>>>>>=20
>>>>> Good luck with it, rick
>>>> thanks
>>>>      danny
>>>>=20
>>>>> =E2=80=A6
>>>>> any ideas?
>>>>>=20
>>>>> thanks,
>>>>>     danny
>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> =
mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org<mailto:freebsd-stable-unsubscribe@=
freebsd.org>"
>>>>=20
>>>> _______________________________________________
>>>> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> =
mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org<mailto:freebsd-stable-unsubscribe@=
freebsd.org>"
>>>=20
>>=20
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"
>>=20
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing =
list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable =
<https://lists.freebsd.org/mailman/listinfo/freebsd-stable>;
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org =
<mailto:freebsd-stable-unsubscribe@freebsd.org>"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8A78F67B-C244-45CF-B9BF-D7062669B33B>