Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Aug 2013 20:02:59 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   rpc.lockd kernel RPC over UDP patch for testing/review
Message-ID:  <1332572251.15040105.1377734579493.JavaMail.root@uoguelph.ca>

next in thread | raw e-mail | index | archive | help
Hi,

Doug White posted this to me via email some time ago (I hope he doesn't
mind me reposting it here):
> First, we have a installed client system doing heavy NFS lock traffic that occasionally
> experiences lockd lockups that require a system reboot to clear. Diagnosis of 
> the most recent hang identified corruption of one of the tracking variables
> (cu->cu_send specifically) in the congestion control in clnt_dg_call() as the culprit. 
> Since lockd only uses one thread, no congestion control is really necessary. We are
> going to make a local patch to avoid the if() that leads to the msleep() if 
> cu->threads = 1 so we don't run into that again, though the corruption of
> cu_send is still a bit troubling. The corruption might stem from repeated retries allowing 
> cu_send to grow without bound, or some other bizarre code path that causes underflow.

After inspecting the code, I found two places where cu_sent (Doug called it cu_send just to
try and confuse me. It worked for a while;-) wasn't incremented when a request was re-inserted
in the send queue. Since it is always decremented when a request is dequeued, I think this
could have resulted in a bogus cu_sent value.

The simple patch at:
 http://people.freebsd.org/~rmacklem/rpcudp.patch
adds increments for cu_sent for these two places.

If anyone is using rpc.lockd and can test/review this patch, it would be appreciated.

Thanks, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1332572251.15040105.1377734579493.JavaMail.root>