Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Mar 2003 09:22:38 -0800
From:      Steve Sizemore <steve@ls.berkeley.edu>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        current@freebsd.org
Subject:   Re: NFS file unlocking problem
Message-ID:  <20030318172237.GA320@math.berkeley.edu>
In-Reply-To: <3E76CC9A.BBAAED4A@mindspring.com>
References:  <Pine.LNX.4.44.0303171255310.15683-100000@mail.allcaps.org> <3E768C47.229C1DF0@mindspring.com> <20030318065716.GB99408@math.berkeley.edu> <3E76CC9A.BBAAED4A@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 17, 2003 at 11:36:58PM -0800, Terry Lambert wrote:
> Steve Sizemore wrote:
> > useful. As it is, it's still interesting. I have no way of judging the
> > quality of the code in question, other than the empirical result that
> > it works in most cases.
> 
> Well, then you are stuck with the code you have that someone else
> wrote.  Hopefully that's not your problem, or your are in trouble.
> 8-).

Actually, maybe not, since it's a commercial program. If I could
demonstrate that it's their problem, I could put pressure on them to
fix it. However, at this point, I don't think that's the case.

> OK, then it isn't an intra-program deadlock, which is something.
> 
> It could still be inter-program, but if it is, it's not going to
> be easy to find; you will need to find someone who *is* a programmer.
> FWIW, this happen when:
> 
> 	Program 1	Program 2
> 	LOCK A
> 			LOCK B
> 	LOCK B (Waiting for Program 2)
> 			LOCK A (Waiting for Program 1 waiting for me)

I don't see now it could be "inter-program", since I've gone to great
lengths to simplify it to a single program failing on a brand new file.

> > > On the other hand, this is clearly a deadlock that requires an
> > > existing, conflicting lock -- IFF the you are correct about the
> > > delayed locking behaviour.
> > 
> > Not sure I understand this.
> 
> If someone didn't already have it locks, your lock which waits for
> the region to be able to lock it would not need to wait: it would
> just give you the lock, and you wouldn't have the problem.

Oh, so that's what that meant. :-) But (see above) it's pretty clear
to me that nothing else could have it locked.

> 
> You need to find out why it's waiting.  If it's waiting, it's
> waiting for somebody.  You need to know who that somebody is.
>
> Once you know that, you can go hit them over the head with a
> large baseball bat.  8-).
 
Yes. But that somebody is undoubtedly not a real person.

> I have attached the program to run on your Solaris box.  You
> may have to look in /usr/include/sys/fcntl.h to see the right
> name, if it complains about l_rsysid (might be l_sysid, or whatever).
> 
>
> 
> I'm attaching a test program to run on the server when the
> lock fails, using information from the trace to know the name
> of the file to enter, and the ethreal decoded packet trace to
> know how to answer the other questions.

I'll try it today.

> But I think it may be as simple as you not telling us that you
> have multiple IP addresses configured on one of your machines?

No, but this might be an important clue. The FreeBSD host has multiple
(2) A Records in the DNS. In fact, I think that when it last worked,
it had only a single A Record. Also, I notice that there are two
rpc.lockd processes running on the FreeBSD server. I hadn't noticed
that before it started failing, but I didn't mention it, since
rpc.lockd does get invoked twice in rc.network. However, rpc.statd
also gets called twice, and there's only a single version of it
running...

    root     399  0.0  0.1 263496 1000 ??  Is    9:11AM   0:00.00 /usr/sbin/rpc.sta
    root     402  0.0  0.1  1512 1156  ??  Ss    9:11AM   0:00.00 /usr/sbin/rpc.loc
    daemon   405  0.0  0.1  1484 1176  ??  I     9:11AM   0:00.00 /usr/sbin/rpc.loc


Does that indicate a problem?

> 
> If so, try:
> 
> 	sysctl -w net.inet.ip.check_interface=0

What does this do, just turn off checking? Can I do this on the
running system, or do I need to put it into sysctl.conf and reboot?
(BTW, from the man page -
  "The -w option has been deprecated and is silently ignored.")

Thanks.
Steve
-- 
Steve Sizemore <steve@ls.berkeley.edu>, (510) 642-8570
Unix System Manager
    Dept. of Mathematics and College of Letters and Science
    University of California, Berkeley

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030318172237.GA320>