Date: Tue, 18 Mar 2003 09:22:38 -0800 From: Steve Sizemore <steve@ls.berkeley.edu> To: Terry Lambert <tlambert2@mindspring.com> Cc: current@freebsd.org Subject: Re: NFS file unlocking problem Message-ID: <20030318172237.GA320@math.berkeley.edu> In-Reply-To: <3E76CC9A.BBAAED4A@mindspring.com> References: <Pine.LNX.4.44.0303171255310.15683-100000@mail.allcaps.org> <3E768C47.229C1DF0@mindspring.com> <20030318065716.GB99408@math.berkeley.edu> <3E76CC9A.BBAAED4A@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 17, 2003 at 11:36:58PM -0800, Terry Lambert wrote: > Steve Sizemore wrote: > > useful. As it is, it's still interesting. I have no way of judging the > > quality of the code in question, other than the empirical result that > > it works in most cases. > > Well, then you are stuck with the code you have that someone else > wrote. Hopefully that's not your problem, or your are in trouble. > 8-). Actually, maybe not, since it's a commercial program. If I could demonstrate that it's their problem, I could put pressure on them to fix it. However, at this point, I don't think that's the case. > OK, then it isn't an intra-program deadlock, which is something. > > It could still be inter-program, but if it is, it's not going to > be easy to find; you will need to find someone who *is* a programmer. > FWIW, this happen when: > > Program 1 Program 2 > LOCK A > LOCK B > LOCK B (Waiting for Program 2) > LOCK A (Waiting for Program 1 waiting for me) I don't see now it could be "inter-program", since I've gone to great lengths to simplify it to a single program failing on a brand new file. > > > On the other hand, this is clearly a deadlock that requires an > > > existing, conflicting lock -- IFF the you are correct about the > > > delayed locking behaviour. > > > > Not sure I understand this. > > If someone didn't already have it locks, your lock which waits for > the region to be able to lock it would not need to wait: it would > just give you the lock, and you wouldn't have the problem. Oh, so that's what that meant. :-) But (see above) it's pretty clear to me that nothing else could have it locked. > > You need to find out why it's waiting. If it's waiting, it's > waiting for somebody. You need to know who that somebody is. > > Once you know that, you can go hit them over the head with a > large baseball bat. 8-). Yes. But that somebody is undoubtedly not a real person. > I have attached the program to run on your Solaris box. You > may have to look in /usr/include/sys/fcntl.h to see the right > name, if it complains about l_rsysid (might be l_sysid, or whatever). > > > > I'm attaching a test program to run on the server when the > lock fails, using information from the trace to know the name > of the file to enter, and the ethreal decoded packet trace to > know how to answer the other questions. I'll try it today. > But I think it may be as simple as you not telling us that you > have multiple IP addresses configured on one of your machines? No, but this might be an important clue. The FreeBSD host has multiple (2) A Records in the DNS. In fact, I think that when it last worked, it had only a single A Record. Also, I notice that there are two rpc.lockd processes running on the FreeBSD server. I hadn't noticed that before it started failing, but I didn't mention it, since rpc.lockd does get invoked twice in rc.network. However, rpc.statd also gets called twice, and there's only a single version of it running... root 399 0.0 0.1 263496 1000 ?? Is 9:11AM 0:00.00 /usr/sbin/rpc.sta root 402 0.0 0.1 1512 1156 ?? Ss 9:11AM 0:00.00 /usr/sbin/rpc.loc daemon 405 0.0 0.1 1484 1176 ?? I 9:11AM 0:00.00 /usr/sbin/rpc.loc Does that indicate a problem? > > If so, try: > > sysctl -w net.inet.ip.check_interface=0 What does this do, just turn off checking? Can I do this on the running system, or do I need to put it into sysctl.conf and reboot? (BTW, from the man page - "The -w option has been deprecated and is silently ignored.") Thanks. Steve -- Steve Sizemore <steve@ls.berkeley.edu>, (510) 642-8570 Unix System Manager Dept. of Mathematics and College of Letters and Science University of California, Berkeley To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030318172237.GA320>