Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Dec 2019 14:09:37 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        Richard P Mackerras <mack63richard@gmail.com>, "stable@freebsd.org" <stable@freebsd.org>
Subject:   Re: nfs lockd errors after NetApp software upgrade.
Message-ID:  <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il>
References:  <EBC4AD74-EC62-4C67-AB93-1AA91F662AAC@cs.huji.ac.il> <YQBPR0101MB1427411AFE335E869B9CF022DD530@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <CAGfybS-3Rvs57=oGFEfii_9a=aWxPr6dEq1Y1LqHbLXK1ZKmXA@mail.gmail.com> <YQBPR0101MB1427F9BE658B9A46C7E08335DD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>, <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Braniss wrote:=0A=
[stuff snipped]=0A=
>all mounts are nfsv3/tcp=0A=
This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't kn=
ow when=0A=
the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times=
.=0A=
=0A=
To me, it looks like a network configuration issue.=0A=
You could capture packets (maybe when a client first starts rpc.statd and r=
pc.lockd)=0A=
and then look at them in wireshark. I'd disable statup of rpc.lockd and rpc=
.statd=0A=
at boot for a test client and then run something like:=0A=
# tcpdump -s 0 -s out.pcap host <netapp-host>=0A=
- and then start rpc.statd and rpc.lockd=0A=
Then I'd look at out.pcap in wireshark (much better at decoding this stuff =
than=0A=
tcpdump). I'd look for things like different reply IP addresses from the Ne=
tapp,=0A=
which might confuse this tired old NLM protocol Sun devised in the mid-1980=
s.=0A=
=0A=
>the error is also appearing on freebsd-11.2-stable, I=92m now checking if =
it=92s also=0A=
>happening on 12.1=0A=
>btw, the NetApp version is 9.3P17=0A=
Yes. I wasn't the author of the NSM and NLM code (long ago I refused to eve=
n=0A=
try to implement it, because I knew the protocol was badly broken) and I av=
oid=0A=
fiddling with. As such, it won't have change much since around FreeBSD7.=0A=
=0A=
rick=0A=
=0A=
cheers,=0A=
        danny=0A=
=0A=
> rick=0A=
>=0A=
> Cheers=0A=
>=0A=
> Richard=0A=
> (NetApp admin)=0A=
>=0A=
> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss <danny@cs.huji.ac.il<mailto:=
danny@cs.huji.ac.il>> wrote:=0A=
>=0A=
>=0A=
>> On 18 Dec 2019, at 16:55, Rick Macklem <rmacklem@uoguelph.ca<mailto:rmac=
klem@uoguelph.ca>> wrote:=0A=
>>=0A=
>> Daniel Braniss wrote:=0A=
>>=0A=
>>> Hi,=0A=
>>> The server with the problems is running FreeBSD 11.1 stable, it was wor=
king fine for >several months,=0A=
>>> but after a software upgrade of our NetAPP server it=92s reporting many=
 lockd errors >and becomes catatonic,=0A=
>>> ...=0A=
>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not res=
ponding=0A=
>>> Dec 18 13:11:45 moo-09 last message repeated 7 times=0A=
>>> Dec 18 13:12:55 moo-09 last message repeated 8 times=0A=
>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is aliv=
e again=0A=
>>> Dec 18 13:13:10 moo-09 last message repeated 8 times=0A=
>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Liste=
n queue >overflow: 194 already in queue awaiting acceptance (1 occurrences)=
=0A=
>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Liste=
n queue >overflow: 193 already in queue awaiting acceptance (3957 occurrenc=
es)=0A=
>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Liste=
n queue >overflow: 193 already in queue awaiting acceptance =85=0A=
>> Seems like their software upgrade didn't improve handling of NLM RPCs?=
=0A=
>> Appears to be handling RPCs slowly and/or intermittently. Note that no o=
ne=0A=
>> tests it with IPv6, so at least make sure you are still using IPv4 for t=
he mounts and=0A=
>> try and make sure IP broadcast works between client and Netapp. I think =
the NLM=0A=
>> and NSM (rpc.statd) still use IP broadcast sometimes.=0A=
>>=0A=
> we are ipv4 - we have our own class c :-)=0A=
>> Maybe the network guys can suggest more w.r.t. why, but as I've stated b=
efore,=0A=
>> the NLM is a fundamentally broken protocol which was never published by =
Sun,=0A=
>> so I suggest you avoid using it if at all possible.=0A=
> well, at the moment the ball is on NetAPP court, and switching to NFSv4 a=
t the moment is out of the question, it=92s=0A=
> a production server used by several thousand students.=0A=
>=0A=
>>=0A=
>> - If the locks don't need to be seen by other clients, you can just use =
the "nolockd"=0A=
>>  mount option.=0A=
>> or=0A=
>> - If locks need to be seen by other clients, try NFSv4 mounts. Netapp fi=
lers=0A=
>>  should support NFSv4.1, which is a much better protocol that NFSv4.0.=
=0A=
>>=0A=
>> Good luck with it, rick=0A=
> thanks=0A=
>        danny=0A=
>=0A=
>> =85=0A=
>> any ideas?=0A=
>>=0A=
>> thanks,=0A=
>>       danny=0A=
>>=0A=
>> _______________________________________________=0A=
>> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing li=
st=0A=
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable=0A=
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org=
<mailto:freebsd-stable-unsubscribe@freebsd.org>"=0A=
>=0A=
> _______________________________________________=0A=
> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing lis=
t=0A=
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable=0A=
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org<=
mailto:freebsd-stable-unsubscribe@freebsd.org>"=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB1427F445F1F1EAF382E5131ADD520>