Date: Wed, 28 Aug 2013 20:02:59 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: rpc.lockd kernel RPC over UDP patch for testing/review Message-ID: <1332572251.15040105.1377734579493.JavaMail.root@uoguelph.ca>
next in thread | raw e-mail | index | archive | help
Hi, Doug White posted this to me via email some time ago (I hope he doesn't mind me reposting it here): > First, we have a installed client system doing heavy NFS lock traffic that occasionally > experiences lockd lockups that require a system reboot to clear. Diagnosis of > the most recent hang identified corruption of one of the tracking variables > (cu->cu_send specifically) in the congestion control in clnt_dg_call() as the culprit. > Since lockd only uses one thread, no congestion control is really necessary. We are > going to make a local patch to avoid the if() that leads to the msleep() if > cu->threads = 1 so we don't run into that again, though the corruption of > cu_send is still a bit troubling. The corruption might stem from repeated retries allowing > cu_send to grow without bound, or some other bizarre code path that causes underflow. After inspecting the code, I found two places where cu_sent (Doug called it cu_send just to try and confuse me. It worked for a while;-) wasn't incremented when a request was re-inserted in the send queue. Since it is always decremented when a request is dequeued, I think this could have resulted in a bogus cu_sent value. The simple patch at: http://people.freebsd.org/~rmacklem/rpcudp.patch adds increments for cu_sent for these two places. If anyone is using rpc.lockd and can test/review this patch, it would be appreciated. Thanks, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1332572251.15040105.1377734579493.JavaMail.root>