Date: Thu, 19 Dec 2019 16:21:16 +0200 From: Daniel Braniss <danny@cs.huji.ac.il> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Richard P Mackerras <mack63richard@gmail.com>, "stable@freebsd.org" <stable@freebsd.org> Subject: Re: nfs lockd errors after NetApp software upgrade. Message-ID: <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> In-Reply-To: <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> References: <EBC4AD74-EC62-4C67-AB93-1AA91F662AAC@cs.huji.ac.il> <YQBPR0101MB1427411AFE335E869B9CF022DD530@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <CAGfybS-3Rvs57=oGFEfii_9a=aWxPr6dEq1Y1LqHbLXK1ZKmXA@mail.gmail.com> <YQBPR0101MB1427F9BE658B9A46C7E08335DD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM> <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il> <YQBPR0101MB1427F445F1F1EAF382E5131ADD520@YQBPR0101MB1427.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 19 Dec 2019, at 16:09, Rick Macklem <rmacklem@uoguelph.ca> wrote: >=20 > Daniel Braniss wrote: > [stuff snipped] >> all mounts are nfsv3/tcp > This doesn't affect what the NLM code (rpc.lockd) uses. I honestly = don't know when > the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at = times. can the replay cache have any influence here? I tend to remember way = back issues with it, >=20 > To me, it looks like a network configuration issue. that was/is my gut feelings too, but, as far as we can tell, nothing has = changed in the network infrastructure, the problems appeared after the NetAPP=E2=80=99s software was updated, = it was working fine till then. the problems are also happening on freebsd 12.1 > You could capture packets (maybe when a client first starts rpc.statd = and rpc.lockd) > and then look at them in wireshark. I'd disable statup of rpc.lockd = and rpc.statd > at boot for a test client and then run something like: > # tcpdump -s 0 -s out.pcap host <netapp-host> > - and then start rpc.statd and rpc.lockd > Then I'd look at out.pcap in wireshark (much better at decoding this = stuff than > tcpdump). I'd look for things like different reply IP addresses from = the Netapp, > which might confuse this tired old NLM protocol Sun devised in the = mid-1980s. >=20 it=E2=80=99s going to be an interesting week end :-( =20 >> the error is also appearing on freebsd-11.2-stable, I=E2=80=99m now = checking if it=E2=80=99s also >> happening on 12.1 >> btw, the NetApp version is 9.3P17 > Yes. I wasn't the author of the NSM and NLM code (long ago I refused = to even > try to implement it, because I knew the protocol was badly broken) and = I avoid > fiddling with. As such, it won't have change much since around = FreeBSD7. and we haven=E2=80=99t had any issues with it for years, so you must = have done something good cheers, danny >=20 > rick >=20 > cheers, > danny >=20 >> rick >>=20 >> Cheers >>=20 >> Richard >> (NetApp admin) >>=20 >> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss = <danny@cs.huji.ac.il<mailto:danny@cs.huji.ac.il>> wrote: >>=20 >>=20 >>> On 18 Dec 2019, at 16:55, Rick Macklem = <rmacklem@uoguelph.ca<mailto:rmacklem@uoguelph.ca>> wrote: >>>=20 >>> Daniel Braniss wrote: >>>=20 >>>> Hi, >>>> The server with the problems is running FreeBSD 11.1 stable, it was = working fine for >several months, >>>> but after a software upgrade of our NetAPP server it=E2=80=99s = reporting many lockd errors >and becomes catatonic, >>>> ... >>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not = responding >>>> Dec 18 13:11:45 moo-09 last message repeated 7 times >>>> Dec 18 13:12:55 moo-09 last message repeated 8 times >>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is = alive again >>>> Dec 18 13:13:10 moo-09 last message repeated 8 times >>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: = Listen queue >overflow: 194 already in queue awaiting acceptance (1 = occurrences) >>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: = Listen queue >overflow: 193 already in queue awaiting acceptance (3957 = occurrences) >>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: = Listen queue >overflow: 193 already in queue awaiting acceptance =E2=80=A6= >>> Seems like their software upgrade didn't improve handling of NLM = RPCs? >>> Appears to be handling RPCs slowly and/or intermittently. Note that = no one >>> tests it with IPv6, so at least make sure you are still using IPv4 = for the mounts and >>> try and make sure IP broadcast works between client and Netapp. I = think the NLM >>> and NSM (rpc.statd) still use IP broadcast sometimes. >>>=20 >> we are ipv4 - we have our own class c :-) >>> Maybe the network guys can suggest more w.r.t. why, but as I've = stated before, >>> the NLM is a fundamentally broken protocol which was never published = by Sun, >>> so I suggest you avoid using it if at all possible. >> well, at the moment the ball is on NetAPP court, and switching to = NFSv4 at the moment is out of the question, it=E2=80=99s >> a production server used by several thousand students. >>=20 >>>=20 >>> - If the locks don't need to be seen by other clients, you can just = use the "nolockd" >>> mount option. >>> or >>> - If locks need to be seen by other clients, try NFSv4 mounts. = Netapp filers >>> should support NFSv4.1, which is a much better protocol that = NFSv4.0. >>>=20 >>> Good luck with it, rick >> thanks >> danny >>=20 >>> =E2=80=A6 >>> any ideas? >>>=20 >>> thanks, >>> danny >>>=20 >>> _______________________________________________ >>> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> = mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org<mailto:freebsd-stable-unsubscribe@= freebsd.org>" >>=20 >> _______________________________________________ >> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing = list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org<mailto:freebsd-stable-unsubscribe@= freebsd.org>" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8770BD0D-4B72-431A-B4F5-A29D4DBA03B1>