From owner-freebsd-hackers Mon Mar 1 9:12:31 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from mercury.inktomi.com (mercury.inktomi.com [209.1.32.126]) by hub.freebsd.org (Postfix) with ESMTP id BCE4715414 for ; Mon, 1 Mar 1999 09:11:22 -0800 (PST) (envelope-from jplevyak@inktomi.com) Received: from tsdev.inktomi.com (tsdev.inktomi.com [209.1.32.119]) by mercury.inktomi.com (8.9.1a/8.9.1) with SMTP id JAA00403; Mon, 1 Mar 1999 09:11:15 -0800 (PST) Received: by tsdev.inktomi.com (SMI-8.6/SMI-SVR4) id JAA22097; Mon, 1 Mar 1999 09:11:05 -0800 Message-ID: <19990301091105.B21935@tsdev.inktomi.com> Date: Mon, 1 Mar 1999 09:11:05 -0800 From: John Plevyak To: Terry Lambert , John Plevyak Cc: hackers@FreeBSD.ORG Subject: Re: lockf and kernel threads References: <19990224165840.A19033@proxydev.inktomi.com> <199902272043.NAA14007@usr09.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <199902272043.NAA14007@usr09.primenet.com>; from Terry Lambert on Sat, Feb 27, 1999 at 08:43:33PM +0000 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Feb 27, 1999 at 08:43:33PM +0000, Terry Lambert wrote: > > Ok, I found the problem with using kernel threads and fcntl( .. F_SETLK ..). > > The issue is two fold. > > > > 1. the locks are keyed off pid, and if the first thread to exit and close > > them is not the one that took out the lock the close will occur without > > the lock being removed. > > Probably the correct way to deal with this is to put the locks into > seperate collision domains for the keying. > > A very simple way to do this would be to take the NFS server locking > kernel patches from www.freebsd.org/~terry/, and then make rpid the > same as pid for unthreaded process, the same as the thread ID for > threaded processes, and then make the rpid value significant for > both hash bucket selection and locally asserted locks. > > This would result in the locks locking against each other. This is a good idea, and one I considered for the patch. Two problems with this: 1. the obvious choice of rpid is p_leader->p_pid. However, it is possible for the leader to terminate a short time (time to service a 'kill') before the peers. Depending on how overloaded the machine, NFS troubles etc, I might worry about PID recycling. 2. It requires a more substantial change to the locking and a modification of the proc struct. The reason is that currently file locking uses the proc pointer as the 'id' or uniqueness token. I wanted to see if something simpler could be done. Alternatively, if one can ensure that the p_leader does not die before the peers, then one can preserve the locking code and refraim from changing the proc strucure since one can simply pass p->p_leader down to the locking code. Since for unthreaded programs p == p->p_leader this yields very simple code for both cases. Moreover, this change enables other such threading problems to be fixed because logically 'process-type' state can be stored in p->p_leader->foo and logically 'thread-type' state can be stored in p->foo. (this is the change I chose) > > > > 2. the P_ADVLOCK flag is set in p_flags on a per processes basis, so > > we don't even check to see if locks need to be removed if the > > closing (struct proc*) is not the locking (struct proc*). > > POSIX is very explicit, that the first close on a file within a single > process results in the locks being deasserted. > > I think that your patches fail to address this issue, since in reality, > I believe that you want to treat each thread as if it were a seperate > process for the purpose of specifying POSIX close semantics. The patch does in fact handle this case. The first closing thread with pass p->p_leader into the VOP_ADVLOCK and cause the lock to be released. In the current system, only if the original thread which obtained the lock was the first to close it would the lock be released. > > The patches also have the coeslescence problem that made me seperate > the VOP_ADVLOCK code into an assert-veto/affirm-coelesce, instead > of what they do now. > > Basically, the problem is that when you assert a byte range lock on > a file (e.g., a read lock), and an overlapping lock is granted to > another thread in your process (e.g. a read lock with some overlap), > and one or the other of the locks (but not both) are deasserted, then > the overlapping region is no longer read locked for the thread that > did not release it's lock. > > In effect, the locks cast "shadows" onto a linear contention space > for each file, and shadows with the same indices are coelesced. > > The purpose in dealing with this for NFS server locking is that a > single process in user space proxies the locks for a large number > of clients, and locks from one client machine that overlap another > lock from the same client machine results in a condition similar > to multiple threads within the same process asserting locks, and > expecting them to be enforced as if they weren't in the same > contention domain (it happens to clean up VFS advisory locking on > union FS stacking at the same time, but that's just a nice side > effect). > I understand the problem for NFS locking, but for threaded programs it would seem that lock shadowing would be the desired behavior The program is logically one process, and the lock ranges are shared state, not thread-specific state. > > > As a general comment on your original problem, the correct way to > handle signals (according to the pthreads model) is to disable > disnals in all threads but one, which is designated as the signal > handling thread. > > One problem here is that FreeBSD kernel threads using rfork get > different PID's, so the signal handling will never work correctly > anyway. > > For user space threads and the Linux crap, the signal handling > should be possible, if you follow the disallow-all-but-one-thread > rule. I considered that also. There are a few problems with this: 1. some signals cannot be blocked. In particular, if the non-locking thread is 'KILL'ed, then the lock will never be released in the current code. 2. the abort(3) code explicitly unblocks the calling threads signal handler so that strategy will not work for asserts, aborts. 3. signals are used for other purposes, some of which are not amenable to the 'disallow-all-but-on-thread' technique. > > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message john -- John Bradley Plevyak, PhD, jplevyak@inktomi.com, PGP KeyID: 051130BD Inktomi Corporation, 1900 S. Norfolk Street, Suite 110, San Mateo, CA 94403 W:(415)653-2830 F:(415)653-2801 P:(888)491-1332/5103192436.4911332@pagenet.net To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message