Date: Wed, 8 Mar 2006 14:01:24 GMT From: Miguel Lopes Santos Ramos <miguel@anjos.strangled.net> To: kris@obsecurity.org Cc: kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org Subject: Re: rpc.lockd brokenness (2) Message-ID: <200603081401.k28E1Obv006775@compaq.anjos.strangled.net> In-Reply-To: <20060308005138.GA49684@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> From: Kris Kennaway <kris@obsecurity.org> > Subject: Re: rpc.lockd brokenness (2) > > I wonder if something else is going wrong and it's not rpc.lockd at > all. Oh, it's a locking problem alright. But perhaps not in rpc.lockd... > It looks like this wasn't made using -s 0 - sorry if I wasn't > explicit. You must give all details to rookies... I've changed things a bit, but perhaps there's a test now which is more easily reproducible on other systems. The following tcpdumps were obtaining by booting in single-user mode on the diskless machine and doing the following sequence for initialization: # mount -u / # /etc/rc.d/netif start # /etc/rc.d/rpcbind start # /etc/rc.d/nfsclient start # /etc/rc.d/nfslocking start And then, with /var/run/cron.pid removed, # /etc/rc.d/cron start Starting cron. # /etc/rc.d/cron stop # /etc/rc.d/nfslocking stop # /etc/rc.d/nfsclient stop # /etc/rc.d/rpcbind stop # reboot see http://mega.ist.utl.pt/~mlsr/nfs-nofile.bin Everything seemed to be ok, but /var/run/cron.pid was left locked on the server. Then, with /var/run/cron.pid still locked, # /etc/rc.d/cron start ... cron already running (pid=111).. something like that, which is ok # /etc/rc.d/cron stop # reboot see http://mega.ist.utl.pt/~mlsr/nfs-lockedpass.bin The result of this test is ok, but when booting multiuser, cron still hangs instead of saying it's already running, and, when I checked if /var/run/cron.pid was still locked, for accident on a third machine with # lockf -k -t 1 .../var/run/cron.pid echo ok lockf hung on this third machine, in spite of -t 1 parameter, it remained unkillable. With /var/run/cron.pid still locked, on the first client, single-user, same initialization sequence # lockf -k -t 1 /var/run/cron.pid echo ok Hangs... always. see http://mega.ist.utl.pt/~mlsr/nfs-lockfhang.bin (this tcpdump is quite big, perhaps it included loading the kernel) Now, given this, since the hang also occurs with lockf, I tried another test, on a different machine (the one that's called dual). The tcpdump was done on the server: tcpdump -s 0 -w nfs-other.bin host dual and udp port nfs Now, two vts on the client, in the first, this sequence: # mkdir test # mount compaq:/x1 test # touch test/lock-file ; lockf -k -t 1 test/lock-file sh # On the second vt, # lockf -k -t 1 test/lock-file echo ok it hung. Tried ^C. still hung. On the first vt, # exit On the second vt, lockf had returned to prompt. The tcpdump is on http://mega.ist.utl.pt/~mlsr/nfs-other.bin The output of uname -a on the client (dual) is: FreeBSD dual 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 7 18:03:35 WET 2006 root@dual:/usr/obj/usr/src/sys/DUAL i386 and on the server (compaq) is: FreeBSD compaq 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #3: Tue Feb 14 13:04:11 WET 2006 root@dual:/usr/obj/usr/src/sys/COMPAQ i386 Please try also what I did, two vts on a client, trying to lock the same file on the server with lockf. The description of the problem that I have becomes increasingly similar to what is in pr bin/80something. Greetings, Miguel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603081401.k28E1Obv006775>