From owner-freebsd-stable@freebsd.org Fri Dec 20 03:07:37 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 041C51CA2AD for ; Fri, 20 Dec 2019 03:07:37 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (boomhauer.egr.msu.edu [35.9.37.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47fDFM63cTz4NRf for ; Fri, 20 Dec 2019 03:07:35 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from boomhauer (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id 4E8EDE59AE for ; Thu, 19 Dec 2019 22:07:34 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by boomhauer (boomhauer.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7qTnXS3lirez for ; Thu, 19 Dec 2019 22:07:33 -0500 (EST) Received: from EGR authenticated sender mcdouga9 Subject: Re: nfs lockd errors after NetApp software upgrade. To: freebsd-stable@freebsd.org References: <0121E289-D2AE-44BA-ADAC-4814CAEE676F@cs.huji.ac.il> <854B6E5A-C6BC-44B3-A656-FC9B8EF19881@cs.huji.ac.il> <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> From: Adam McDougall Message-ID: Date: Thu, 19 Dec 2019 22:07:32 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: <8770BD0D-4B72-431A-B4F5-A29D4DBA03B1@cs.huji.ac.il> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 47fDFM63cTz4NRf X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=msu.edu; spf=pass (mx1.freebsd.org: domain of mcdouga9@egr.msu.edu designates 35.9.37.164 as permitted sender) smtp.mailfrom=mcdouga9@egr.msu.edu X-Spamd-Result: default: False [-3.04 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:boomhauer.egr.msu.edu]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[4]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[164.37.9.35.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[msu.edu,none]; IP_SCORE(-0.04)[asn: 237(-0.13), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:237, ipnet:35.0.0.0/10, country:US]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Dec 2019 03:07:37 -0000 Try changing bool_t do_tcp = FALSE; to TRUE in /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I think this makes it match Linux client behavior. I suspect I ran into the same issue as you. I do think I used nolockd is a workaround temporarily. I can provide some more details if it works. On 12/19/19 9:21 AM, Daniel Braniss wrote: > > >> On 19 Dec 2019, at 16:09, Rick Macklem wrote: >> >> Daniel Braniss wrote: >> [stuff snipped] >>> all mounts are nfsv3/tcp >> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't know when >> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times. > can the replay cache have any influence here? I tend to remember way back issues > with it, >> >> To me, it looks like a network configuration issue. > that was/is my gut feelings too, but, as far as we can tell, nothing has changed in the network infrastructure, > the problems appeared after the NetAPP’s software was updated, it was working fine till then. > > the problems are also happening on freebsd 12.1 > >> You could capture packets (maybe when a client first starts rpc.statd and rpc.lockd) >> and then look at them in wireshark. I'd disable statup of rpc.lockd and rpc.statd >> at boot for a test client and then run something like: >> # tcpdump -s 0 -s out.pcap host >> - and then start rpc.statd and rpc.lockd >> Then I'd look at out.pcap in wireshark (much better at decoding this stuff than >> tcpdump). I'd look for things like different reply IP addresses from the Netapp, >> which might confuse this tired old NLM protocol Sun devised in the mid-1980s. >> > it’s going to be an interesting week end :-( > >>> the error is also appearing on freebsd-11.2-stable, I’m now checking if it’s also >>> happening on 12.1 >>> btw, the NetApp version is 9.3P17 >> Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even >> try to implement it, because I knew the protocol was badly broken) and I avoid >> fiddling with. As such, it won't have change much since around FreeBSD7. > and we haven’t had any issues with it for years, so you must have done something good > > cheers, > danny > >> >> rick >> >> cheers, >> danny >> >>> rick >>> >>> Cheers >>> >>> Richard >>> (NetApp admin) >>> >>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss > wrote: >>> >>> >>>> On 18 Dec 2019, at 16:55, Rick Macklem > wrote: >>>> >>>> Daniel Braniss wrote: >>>> >>>>> Hi, >>>>> The server with the problems is running FreeBSD 11.1 stable, it was working fine for >several months, >>>>> but after a software upgrade of our NetAPP server it’s reporting many lockd errors >and becomes catatonic, >>>>> ... >>>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not responding >>>>> Dec 18 13:11:45 moo-09 last message repeated 7 times >>>>> Dec 18 13:12:55 moo-09 last message repeated 8 times >>>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive again >>>>> Dec 18 13:13:10 moo-09 last message repeated 8 times >>>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen queue >overflow: 194 already in queue awaiting acceptance (1 occurrences) >>>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen queue >overflow: 193 already in queue awaiting acceptance (3957 occurrences) >>>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen queue >overflow: 193 already in queue awaiting acceptance … >>>> Seems like their software upgrade didn't improve handling of NLM RPCs? >>>> Appears to be handling RPCs slowly and/or intermittently. Note that no one >>>> tests it with IPv6, so at least make sure you are still using IPv4 for the mounts and >>>> try and make sure IP broadcast works between client and Netapp. I think the NLM >>>> and NSM (rpc.statd) still use IP broadcast sometimes. >>>> >>> we are ipv4 - we have our own class c :-) >>>> Maybe the network guys can suggest more w.r.t. why, but as I've stated before, >>>> the NLM is a fundamentally broken protocol which was never published by Sun, >>>> so I suggest you avoid using it if at all possible. >>> well, at the moment the ball is on NetAPP court, and switching to NFSv4 at the moment is out of the question, it’s >>> a production server used by several thousand students. >>> >>>> >>>> - If the locks don't need to be seen by other clients, you can just use the "nolockd" >>>> mount option. >>>> or >>>> - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers >>>> should support NFSv4.1, which is a much better protocol that NFSv4.0. >>>> >>>> Good luck with it, rick >>> thanks >>> danny >>> >>>> … >>>> any ideas? >>>> >>>> thanks, >>>> danny >>>> >>>> _______________________________________________ >>>> freebsd-stable@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >>> >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >