Date: Thu, 9 Mar 2006 02:07:33 GMT From: Miguel Lopes Santos Ramos <miguel@anjos.strangled.net> To: kris@obsecurity.org Cc: kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org Subject: Re: rpc.lockd brokenness (2) Message-ID: <200603090207.k2927XLa003215@compaq.anjos.strangled.net> In-Reply-To: <20060309005722.GA55432@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> From: Kris Kennaway <kris@obsecurity.org> > Subject: Re: rpc.lockd brokenness (2) > [...] > OK, I misunderstood. The rc.d script will signal cron to kill it, > which should be closing the file descriptors and causing rpc.lockd to > release the lock. Perhaps this part is broken. OK, I tested this > with daemon -p, and it indeed seems to be broken: > > haessal# daemon -p pid_file sleep 100000 > haessal# kill -KILL `cat pid_file` > haessal# ps -p `cat pid_file` > PID TT STAT TIME COMMAND > haessal# lockf -t 0 pid_file echo Yay > lockf: pid_file: already locked Well, your test is quite terse, but perhaps that is more expectable with SIGKILL, but the same thing happens with SIGTERM. On the other hand, what happens there is not so strange, since neither pidfile.c nor daemon.c has any signal handling, and that should probably be expected. Perhaps it's impossible that a lock could be released just because it's owned by a process that dyed, it's the limitations of distributed services... But. cron should have pidfile_remove in it's signal handlers, and it should have a signal handler for SIGTERM for this purpose. I must see the pre-pidfile cron. [...] > > - cron shouldn't hang on startup just because the file is locked, since > > pidfile_open opens it with O_NONBLOCK (unlike lockf). > > I haven't been able to reproduce this, e.g. lockf -t 0 does O_NONBLOCK > locking and works correctly when the file is already locked. Perhaps > it's another locked file (not the pidfile) that was also leaked in the > same way, and is being opened without O_NONBLOCK. > > > - cron shouldn't hang in such a way that it is not killable... (and should > > not also the open system call in lockf be interruptible?) > > This is the bug (really: missing feature) that I described in my > previous mail. Shouldn't even a lock that is opened without O_NONBLOCK be interruptible by a signal? I don't understand why or how are these things unkillable. They did a system call, they're supposed to be inside the kernel, how can rpc.lockd, a user process keep them there... Another thing, I have a question that maybe you can answer. I'm having trouble getting rid of the lock on cron.pid, and, in the end, that's why I can't boot normally. The lock persists even though the file is not "physically" locked on the server. I've tried stopping nfslocking on both sides and removing both /var/db/statd.status. Is there any other persistent storage for this? Miguel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603090207.k2927XLa003215>