From owner-freebsd-fs@FreeBSD.ORG  Thu Aug 29 00:03:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id B9A38E86
 for <freebsd-fs@freebsd.org>; Thu, 29 Aug 2013 00:03:00 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 814FA2252
 for <freebsd-fs@freebsd.org>; Thu, 29 Aug 2013 00:03:00 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEAJOOHlKDaFve/2dsb2JhbABaFoMmUYMnvH6BOHSCTgSBBwINGQJfiBQMmASOf5IrgSmMcoEVgyOBMQOZHpA0gzwggTU5
X-IronPort-AV: E=Sophos;i="4.89,978,1367985600"; d="scan'208";a="47973851"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 28 Aug 2013 20:02:59 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7A07BB3F1B
 for <freebsd-fs@freebsd.org>; Wed, 28 Aug 2013 20:02:59 -0400 (EDT)
Date: Wed, 28 Aug 2013 20:02:59 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: freebsd-fs <freebsd-fs@freebsd.org>
Message-ID: <1332572251.15040105.1377734579493.JavaMail.root@uoguelph.ca>
Subject: rpc.lockd kernel RPC over UDP patch for testing/review
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Aug 2013 00:03:00 -0000

Hi,

Doug White posted this to me via email some time ago (I hope he doesn't
mind me reposting it here):
> First, we have a installed client system doing heavy NFS lock traffic that occasionally
> experiences lockd lockups that require a system reboot to clear. Diagnosis of 
> the most recent hang identified corruption of one of the tracking variables
> (cu->cu_send specifically) in the congestion control in clnt_dg_call() as the culprit. 
> Since lockd only uses one thread, no congestion control is really necessary. We are
> going to make a local patch to avoid the if() that leads to the msleep() if 
> cu->threads = 1 so we don't run into that again, though the corruption of
> cu_send is still a bit troubling. The corruption might stem from repeated retries allowing 
> cu_send to grow without bound, or some other bizarre code path that causes underflow.

After inspecting the code, I found two places where cu_sent (Doug called it cu_send just to
try and confuse me. It worked for a while;-) wasn't incremented when a request was re-inserted
in the send queue. Since it is always decremented when a request is dequeued, I think this
could have resulted in a bogus cu_sent value.

The simple patch at:
 http://people.freebsd.org/~rmacklem/rpcudp.patch
adds increments for cu_sent for these two places.

If anyone is using rpc.lockd and can test/review this patch, it would be appreciated.

Thanks, rick