From owner-freebsd-stable@FreeBSD.ORG  Wed Mar  8 22:45:33 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0D67116A420
	for <freebsd-stable@freebsd.org>; Wed,  8 Mar 2006 22:45:33 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B6B7C43D45
	for <freebsd-stable@freebsd.org>; Wed,  8 Mar 2006 22:45:32 +0000 (GMT)
	(envelope-from kris@obsecurity.org)
Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196])
	by elvis.mu.org (Postfix) with ESMTP id A01721A4D7B;
	Wed,  8 Mar 2006 14:45:32 -0800 (PST)
Received: by obsecurity.dyndns.org (Postfix, from userid 1000)
	id 73D84524AA; Wed,  8 Mar 2006 17:45:31 -0500 (EST)
Date: Wed, 8 Mar 2006 17:45:31 -0500
From: Kris Kennaway <kris@obsecurity.org>
To: Miguel Lopes Santos Ramos <miguel@anjos.strangled.net>
Message-ID: <20060308224531.GA53611@xor.obsecurity.org>
References: <20060308005138.GA49684@xor.obsecurity.org>
	<200603081401.k28E1Obv006775@compaq.anjos.strangled.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="VS++wcV0S1rZb1Fb"
Content-Disposition: inline
In-Reply-To: <200603081401.k28E1Obv006775@compaq.anjos.strangled.net>
User-Agent: Mutt/1.4.2.1i
Cc: kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org, kris@obsecurity.org
Subject: Re: rpc.lockd brokenness (2)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Mar 2006 22:45:33 -0000


--VS++wcV0S1rZb1Fb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Mar 08, 2006 at 02:01:24PM +0000, Miguel Lopes Santos Ramos wrote:

> > I wonder if something else is going wrong and it's not rpc.lockd at
> > all.
>=20
> Oh, it's a locking problem alright. But perhaps not in rpc.lockd...

OK, I think I understand what is going on now...sort of.

> > It looks like this wasn't made using -s 0 - sorry if I wasn't
> > explicit.
>=20
> You must give all details to rookies...

Sorry.

> I've changed things a bit, but perhaps there's a test now which is more e=
asily
> reproducible on other systems.
>=20
> The following tcpdumps were obtaining by booting in single-user mode on t=
he
> diskless machine and doing the following sequence for initialization:
>         # mount -u /
>         # /etc/rc.d/netif start
>         # /etc/rc.d/rpcbind start
>         # /etc/rc.d/nfsclient start
>         # /etc/rc.d/nfslocking start
>=20
> And then, with /var/run/cron.pid removed,
>         # /etc/rc.d/cron start
>         Starting cron.
>         # /etc/rc.d/cron stop
>         # /etc/rc.d/nfslocking stop
>         # /etc/rc.d/nfsclient stop
>         # /etc/rc.d/rpcbind stop
>         # reboot
>         see http://mega.ist.utl.pt/~mlsr/nfs-nofile.bin
>         Everything seemed to be ok, but /var/run/cron.pid was left locked=
 on
>         the server.

This is intentional.  It's how pidfile_*() tests whether the process
is still running.  The intention is that if someone tries to open the
pidfile again while the first process is still running, the lock
acquisition will fail and we'll know the other process is still alive,
and therefore avoid starting a second instance.

Your main problems seems to be that you're mounting the same /var via
NFS from multiple client machines.  This is basically a bad idea to
begin with because /var expects to be private to each machine (even if
locking worked as expected, you'd not be able to start cron on more
than one machine because it would fail as above).  Even if you solved
this there would be other similar problems.

In fact the diskless boot infrastructure in /etc will set up and use a
md /var for this purpose.

There is a (known) lockd bug here though, which you isolated:

> With /var/run/cron.pid still locked, on the first client, single-user, sa=
me
> initialization sequence
>         # lockf -k -t 1 /var/run/cron.pid echo ok
>         Hangs... always.

which is that lock requests through rpc.lockd cannot be cancelled, so
they'll hang until the operation succeeds or fails.  In this case
lockf does a blocking lock request and expects to cancel it with a
signal after the timer expires, but rpc.lockd doesn't know how to back
out lock requests so it just hangs forever or until something else
unlocks the file on the server.

Kris
--VS++wcV0S1rZb1Fb
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (FreeBSD)

iD8DBQFED16KWry0BWjoQKURAtdkAKDOZ/hNxMPgL500so0t8Mtl0Oi01QCfXouN
huuWeT9TL2A9EkS3oIOWwlo=
=uOe4
-----END PGP SIGNATURE-----

--VS++wcV0S1rZb1Fb--