Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Mar 2006 00:26:44 GMT
From:      Miguel Lopes Santos Ramos <miguel@anjos.strangled.net>
To:        kris@obsecurity.org
Cc:        kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org
Subject:   Re: rpc.lockd brokenness (2)
Message-ID:  <200603090026.k290Qihj002701@compaq.anjos.strangled.net>
In-Reply-To: <20060308224531.GA53611@xor.obsecurity.org>

index | next in thread | previous in thread | raw e-mail

> From: Kris Kennaway <kris@obsecurity.org>
> Subject: Re: rpc.lockd brokenness (2)
>
> This is intentional.  It's how pidfile_*() tests whether the process
> is still running.  The intention is that if someone tries to open the
> pidfile again while the first process is still running, the lock
> acquisition will fail and we'll know the other process is still alive,
> and therefore avoid starting a second instance.

No, no, you got me wrong. The pidfile is left locked after cron stopped
running (with /etc/rc.d/cron stop). This behaviour must be wrong.

> Your main problems seems to be that you're mounting the same /var via
> NFS from multiple client machines.  This is basically a bad idea to
> begin with because /var expects to be private to each machine (even if
> locking worked as expected, you'd not be able to start cron on more
> than one machine because it would fail as above).  Even if you solved
> this there would be other similar problems.

No, it's the whole filesystem tree for a single client, no one else uses
those files. The fact that I hung a third machine was an accident, I was
testing if cron.pid was still locked and I thought I had a window on the
server...

My single problem is locking. Actually, it worked well before I upgraded
this system to 6-STABLE. It's just for one laptop whose disk I don't want
to partition.

> In fact the diskless boot infrastructure in /etc will set up and use a
> md /var for this purpose.

Actually, they don't advise using an md /var, only /etc. Anyway, I don't use
that, because it's my only diskless machine. I have a single NFS mounted /
and an md /tmp. There's nothing shared with no one else, not even /usr,
because it's my only amd64.

> There is a (known) lockd bug here though, which you isolated:
>

So, this really is bin/80389?
If so, I can tell Jun Kuriyama that his patch didn't change it.

> > With /var/run/cron.pid still locked, on the first client, single-user, sa=
> me
> > initialization sequence
> >         # lockf -k -t 1 /var/run/cron.pid echo ok
> >         Hangs... always.
>
> which is that lock requests through rpc.lockd cannot be cancelled, so
> they'll hang until the operation succeeds or fails.  In this case
> lockf does a blocking lock request and expects to cancel it with a
> signal after the timer expires, but rpc.lockd doesn't know how to back
> out lock requests so it just hangs forever or until something else
> unlocks the file on the server.
>
> Kris

I am a bit disappointed. First, this problem didn't cause me trouble before
I went to 6-STABLE, now I must either disable cron or disable locking (which
I can't).
And I'm still not completely convinced. That problem, if I understand correctly,
existed before January...

There are two things...
- cron.pid shouldn't be locked after cron terminated. (this interaction was
fully saved as http://mega.ist.utl.pt/~mlsr/nfs-nofile.bin)
- cron shouldn't hang on startup just because the file is locked, since
pidfile_open opens it with O_NONBLOCK (unlike lockf).
- cron shouldn't hang in such a way that it is not killable... (and should
not also the open system call in lockf be interruptible?)

So, I'm led to believe that beyond that issue with rpc.lockd, which,
I understand, is an unresolved problem, there is now another problem,
perhaps with pidfile.c...

Thank you for all your time on this issue. I'm still going to try to chase
it, although I only have the knowledge to find it if it is on pidfile.c or
in cron. I understand little of the interaction between kernel and the rest
of nfs to chase it if it is somewhere else.

Miguel


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603090026.k290Qihj002701>