From owner-freebsd-current@FreeBSD.ORG  Thu Feb  5 09:59:27 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8E47216A4CE
	for <current@freebsd.org>; Thu,  5 Feb 2004 09:59:27 -0800 (PST)
Received: from smtp1.powertech.no (smtp1.powertech.no [195.159.0.145])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EB05A43D3F
	for <current@freebsd.org>; Thu,  5 Feb 2004 09:59:25 -0800 (PST)
	(envelope-from frode@nordahl.net)
Received: from [195.159.148.100] (samwise.xu.nordahl.net [195.159.148.100])
	by smtp1.powertech.no (Postfix) with ESMTP id 3737782F0
	for <current@freebsd.org>; Thu,  5 Feb 2004 18:59:24 +0100 (CET)
Mime-Version: 1.0 (Apple Message framework v612)
Content-Transfer-Encoding: 7bit
Message-Id: <0703C4CC-5805-11D8-951F-000A95A9A574@nordahl.net>
Content-Type: text/plain; charset=US-ASCII; format=flowed
To: current@freebsd.org
From: Frode Nordahl <frode@nordahl.net>
Date: Thu, 5 Feb 2004 18:59:22 +0100
X-Mailer: Apple Mail (2.612)
Subject: Re: rpc.lockd(8) seg faults on 5.2-RELEASE
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Feb 2004 17:59:27 -0000

Hello,

Got an update on the rpc.lockd "hang" issue.

Whenever I observe it does this, I try to kill it off using kill -SEGV 
before restarting it.

In one of the dumps I observed this:
(gdb) print *blockedlocklist_head->lh_first
$1 = {nfslocklist = {le_next = 0x8099000, le_prev = 0x8099000}, 
filehandle = {
     fh_fsid = {val = {1074502253, -394432445}}, fh_fid = {fid_len = 12,
       fid_reserved = 0, fid_data = "?\\@\0r?\202[\0\0\0\0\0\0\0"}},
   addr = 0x80751e0, client = {exclusive = 1, svid = 19869, oh = {n_len 
= 24,
       n_bytes = 0x8056520 "19869@mail7.powertech.no", '?' <repeats 176 
times>...}, l_offset = 0, l_len = 0}, client_cookie = {n_len = 4,
     n_bytes = 0x8075290 "?\221K\001", '?' <repeats 28 times>, "udp6"},
   client_name = "mail7.powertech.no", '\0' <repeats 1005 times>,
   nsm_status = 0, status = 0, flags = 6, blocking = 0, locker = 0, fd = 
0}
(gdb)

Looking at retry_blockingfilelocklist(), this kind of data in 
blockedlocklist_head would most likely make it loop forever.  I 
simulated this behaviour in my own program as well.

But how did le_next end up == le_prev?

I also found this in send_granted(): lockd_lock.c:2161

         debuglog("About to send granted on blocked lock\n");
         sleep(1);
         debuglog("Blowing off return send\n");

Anyone know what sleep(1) is good for here?


Mvh,
Frode