From owner-freebsd-current@FreeBSD.ORG Fri Mar 15 14:08:41 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AAD21697; Fri, 15 Mar 2013 14:08:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 636A92FE; Fri, 15 Mar 2013 14:08:41 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAGgqQ1GDaFvO/2dsb2JhbABDiDG8ZYF7dIIqAQEEASNWBRYYAgINGQJZBoghBrAhknKBI4EqjBQ0B4ItgRMDlluRAoMmIIFs X-IronPort-AV: E=Sophos;i="4.84,850,1355115600"; d="scan'208";a="21380755" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 15 Mar 2013 10:08:40 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7892FB3F43; Fri, 15 Mar 2013 10:08:40 -0400 (EDT) Date: Fri, 15 Mar 2013 10:08:40 -0400 (EDT) From: Rick Macklem To: Lars Eggert Message-ID: <1547734002.3937074.1363356520474.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NewNFS vs. oldNFS for 10.0? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-current , Andre Oppermann X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Mar 2013 14:08:41 -0000 Lars Eggert wrote: > Hi, > > this reminds me that I ran into an issue lately with the new NFS and > locking for NFSv3 mounts on a client that ran -CURRENT and a server > that ran -STABLE. > > When I ran "portmaster -a" on the client, which mounted /usr/ports and > /usr/local, as well as the location of the respective sqlite databases > over NFSv3, the client network stack became unresponsive on all > interfaces for 30 or so seconds and e.g. SSH connections broke. The > serial console remained active throughout, and the system didn't > crash. About a minute after the wedgie I could SSH into the box again, > too. > > The issue went away when I killed lockd on the client, but that caused > the sqlite database to become corrupted over time. The workaround for > me was to move to NFSv4, which has been working fine. (One more reason > to make it the default...) > I've mentioned limitations w.r.t. the design of the NLM protocol (rpc.lockd) before. Any time there is any kind of network topology issue, it will run into difficulties. There may also be other issues. However, since both the old and new client use the same rpc.lockd in the same way (the new one just cribbed the code from the old one), I think the same problem would exist for the old one. As such, I don't believe this is a regression. rick > I'm not really sure how to debug this further, but would be willing to > work with someone off-list who'd tell me what tests to run. > > Lars