From owner-svn-src-all@FreeBSD.ORG  Fri Jan 16 15:28:41 2009
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F0A7106567A;
	Fri, 16 Jan 2009 15:28:40 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 3CEA78FC1C;
	Fri, 16 Jan 2009 15:28:40 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (pool-98-109-39-197.nwrknj.fios.verizon.net
	[98.109.39.197])
	by cyrus.watson.org (Postfix) with ESMTPSA id 9F68146BB3;
	Fri, 16 Jan 2009 10:28:39 -0500 (EST)
Received: from localhost (john@localhost [127.0.0.1]) (authenticated bits=0)
	by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id n0GFS4KM011368;
	Fri, 16 Jan 2009 10:28:33 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: Max Laier <max@love2party.net>
Date: Fri, 16 Jan 2009 10:13:33 -0500
User-Agent: KMail/1.9.7
References: <200812161703.mBGH3M7m042955@svn.freebsd.org>
	<200901151659.35735.jhb@freebsd.org>
	<200901160044.18044.max@love2party.net>
In-Reply-To: <200901160044.18044.max@love2party.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200901161013.33898.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Fri, 16 Jan 2009 10:28:33 -0500 (EST)
X-Virus-Scanned: ClamAV 0.94.2/8871/Thu Jan 15 23:16:59 2009 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: src-committers@freebsd.org, Kip Macy <kmacy@freebsd.org>,
	svn-src-all@freebsd.org, Attilio Rao <attilio@freebsd.org>,
	Robert Watson <rwatson@freebsd.org>, svn-src-head@freebsd.org
Subject: Re: svn commit: r186187 - head/sys/net
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jan 2009 15:28:43 -0000

On Thursday 15 January 2009 6:44:17 pm Max Laier wrote:
> Thank you for getting back to this ...
> 
> On Thursday 15 January 2009 22:59:35 John Baldwin wrote:
> > So the usual model would be to do something like this:
> >
> > 	LIST(foo, ) foo_list;
> >
> > foo_add(foo *p)
> > {
> >
> > 	/* Construct foo, but don't lock it. */
> > 	WLOCK(foo_list);
> > 	LIST_INSERT(foo_list, foo);
> > 	WUNLOCK(foo_list);
> > }
> >
> > foo *
> > foo_find(int key)
> > {
> > 	foo *p;
> >
> > 	RLOCK(foo_list);
> > 	LIST_FOREACH(p, foo_list) {
> > 		if (p->key == key) {
> > 			LOCK(p);
> > 			RUNLOCK(foo_list);
> 
> Is this allowed now?  I was under the impression that it was bad for some 
> reason to interchange locks/unlocks like this.  Something about WITNESS 
> getting confused and/or real lock order problems?  I'm more than happy if 
> that's not the case.

No, lock ordering only matters for obtaining a lock where you can block.  It 
has always been safe to unlock in any order (well, before critical_enter/exit 
when the interrupt state was stored in spin mutexes (i.e. original BSD/OS 
code) you could not interlock spin mutex unlocks, but that was fixed long 
before 5.0).  pfind() uses this model, for example.

> > 			return (p);
> > 		}
> > 	}
> > 	RUNLOCK(foo_list);
> > 	return (NULL);
> > }
> >
> > something_else()
> > {
> > 	foo *p;
> >
> > 	RLOCK(foo_list);
> > 	LIST_FOREACH(p, foo_list) {
> > 		LOCK(p);
> > 		bar(p);
> > 		UNLOCK(p);
> > 	}
> > 	RUNLOCK(foo_list);
> > }
> >
> > From your description it would appear that you are doing something_else()
> > but without actually locking foo_list.  Unless you can demonstrate a clear
> > win in benchmarks from not having the lock there, I would suggest just
> > going with the simpler and more common approach.  Depending on the mtx of
> > the individual head doesn't do anything to solve races with removing the
> > head from the list.
> 
> The thing is that the list of pfil_heads is just a "configuration time" 
> construct.  We really don't want to look at it in the hot path.  Hence we 
> allow the caller of pfil_head_register() (foo_add) to hold on to its 
reference 
> of the pfil_head and work with it without requiring any interaction with the 
> lookup list.  This, however, is not the problem at all.  It's the 
> responsibility of the caller to ensure that it won't fiddle with the cached 
> reference after calling pfil_head_unregister().

I've read some more code now, I had thought pfil_head_register() == 
pfil_add_hook() for some reason.  However, the locking is still not needed 
assuming you follow some simple rules.

1) Everyone uses pfil_head_get() to find a pfil head.  In that case, the 
memory barries in the LIST_UNLOCK() are sufficient so long as the pfil head 
is fully constructed when the LIST_UNLOCK() is done.

2) The caller of pfil_head_register() may use it as well w/o using get.  This 
is allowed because 'curthread' always has an up-to-date view of anything it 
has written.  Put another way, the current CPU's cache is never stale with 
respect to any writes it has performed.

The only thing I see missing in the pfil stuff is that there is no refcount or 
some such to drain hooks from a pfil_head during unregister.  But it also 
seems that nothing actually calls pfil_head_unregister() currently, so that 
isn't but so worrisome.

Does that answer your question?


> > So, reading a bit further, I think the real fix is that the pfil API is a
> > bit busted.  What you really want to do is have pfil_get_head() operate
> > like foo_find() and lock the head before dropping the list lock.  That is
> > the only race I see in the current code.  However, because you malloc()
> > after calling pfil_get_head() that is problematic.  That is, the current
> > consumer code looks like this:
> >
> > 	ph = pfil_get_head(x, y);
> >
> > 	pfil_add_hook(..., ph);		/* calls malloc() */
> >
> > I think you should change the API to be this instead:
> >
> > 	pfil_add_hook(..., x, y);
> >
> > and have pfil_get_head() do the lookup internally:
> >
> > pfil_add_hook()
> > {
> >
> > 	malloc(...);
> >
> > 	ph = pfil_get_head(x, y);	/* locks the 'ph' like foo_find() */
> > }
> 
> This is a good idea/catch, but still not my initial problem.
> 
> My real problem is what foo_remove() should look like in the scenario above.
> 
> I assume that it should look something like this:
> 
> foo_remove(int key) {
> 
>   WLOCK(foo_list);
>   LIST_FOREACH(p, foo_list)
>     if (p->key == key) {
>        LIST_REMOVE(p, entries);
> 
>        LOCK(p);         /* <--- HERE */
> 
>        WUNLOCK(foo_list);
>        /* free resources inside p */
>        /* uninit lock */
>        free(p);
>        return;
>     }
>   WUNLOCK(foo_list);

So part of the key with this is the issue I mentioned above with 
pfil_head_unregister().  You have to solve the issue of outstanding 
references to 'p'.  This may involve a refcount.  Sometimes you do need the 
lock.  When removing processes from zombproc in wait() we use the proc lock 
to block on p_lock until it goes to zero since other things that hold 
references to a process (ptrace, procfs, etc.) use PLOCK/PRELE.  They are 
also required to do something like this:

	p = pfind(pid);		/* returns locked */
	if (p == NULL)
		return (ESRCH);
	if (p->p_state == PRS_ZOMBIE) {
		PROC_UNLOCK(p);
		return (ESRCH);
	}
	_PHOLD(p);
	PROC_UNLOCK(p);

	/* do stuff */

	PRELE(p);

However, whether or not you need the lock depends on your strategy for 
managing outside references to a device.  Some items just use a simpler 
refcount model where foo_find() bumps the item's refcount, and removing the 
item from the list just drops a refcount.  The item is only free'd when the 
last refcount goes away.  If you did this for pfil_heads then each hook would 
hold a reference on the item as well.  However, the current API does not lend 
itself well to this since you have to do another lookup to remove a pfil 
hook.  If you returned a cookie (like bus_setup_intr()) then you could do 
something like this:

	pfil_register_head()
	{
		head = malloc(...);
		refcount_init(head->ref, 1);
		LIST_LOCK();
		if (already_in_list) {
			free(head);
			head = existing;
		} else {
			insert(head);
		}
		pfil_ref_head(head);
		LIST_UNLOCK();
		return (head);
	}

	pfil_get_hook()
	{
		LIST_LOCK();
		LIST_FOREACH(...)
			if (found) {
				pfil_ref_head(p);
				LIST_UNLOCK();
				return (p);
			}
		LIST_UNLOCK();
		return (NULL);
	}

	/* caller must hold reference to hook */
	pfil_add_hook(..., pfh, void **cookiep)
	{
		hook = malloc(...);

		PFH_LOCK(pfh);
		if (already_in_list or other error) {
			PFH_UNLOCK(pfh);
			free(hook);
			return (error);
		}
		hook->head = pfh;
		pfh_ref_head(pfh);	/* hook has a ref */
		/* insert in list */
		PFH_UNLOCK(pfh);
		*cookiep = hook;
	}

	pfil_remove_hook(void *cookie)
	{
		hook = cookie;
		pfh = hook->pfh;

		PFH_LOCK(pfh);
		LIST_REMOVE(...);
		PFH_UNLOCK(pfh);
		free(hook);
		pfh_drop_head(pfh);
	}

	pfil_unregister_hook()
	{
		LIST_LOCK();
		LIST_FOREACH() {
			if (found) {
				LIST_REMOVE(pfh);
				pfh_drop_head(pfh);
			}
		}
	}

	pfil_ref_head(pfh)
	{
		refcount_acquire(&pfh->ref);
	}

	pfil_ref_drop(pfh)
	{
		if (refcount_release(&pfh->ref)) {
			/* tear down state, destroy lock, etc. */
			free(pfh);
		}
	}

Then your code flow might look something like this (this doesn't change the 
pfil_add_hook API as I mentioned earlier btw):

head owner:

	pfh = pfil_register_head(x, y);

head owner wants to remove hook:

	pfil_unregister_head(pfh);
	/*
	 * if there are active hooks, pfh still exists, but pfil_get_head()
	 * won't find it anymore, so no new hooks, just in a zombie state
	 * until all the old hooks go away.
	 */

client module load:

	void *input_hook, *output_hook;

	pfh = pfil_get_head(x, y);	/* holds a reference on pfh now */
	pfil_add_hook(my_input_hook, IN, ..., pfh, &input_hook);
	pfil_add_hook(my_output_hook, OUT, ..., pfh, &output_hook);
	pfh_drop_head(pfh);		/* release our reference */

client module unload:

	/*
	 * note, pfil_get_head(x, y) may not work at this point if the head
	 * has been unregistered.  In that case, if these are the last two
	 * hooks, then the second remove_hook will actually free the head
	 * internally when it calls pfil_head_drop()
	 */
	pfil_remove_hook(input_hook);
	pfil_remove_hook(output_hook);

In all of this you don't need to lock the pfil head in pfil_register_head() or 
pfil_unregister_head().
		
> Is there a reason why 
> we don't want to lock the element's lock while we initialize?  While it is 
> safe if we always go through foo_find() before altering the element and thus 
> synchronize on the list lock, it seems bogus to rely on this (even though it 
> does make sense to go through foo_find() every time as I wrote earlier).

Actually, for a true refcounting scheme to work correctly you _have_ to rely 
on this.  The idea being that you can only gain a new reference while you 
hold an existing one.  Specifically, holding the list lock guarantees that 
you can use the list's reference while you acquire your own private reference 
(e.g. for a new hook) without worrying about the 'guaranteeing' reference 
(the list's) being removed out from under you.

file descriptors use this model for 'struct file', for example, where each 
file descriptor table holds a reference on the file and individual system 
calls hold a reference for the duration of their system call.  For them the 
normal sequence is something like:

	error = fget(fdp, fd, &fp);
	if (error)
		return (error);
	/* 'fp' now holds a private reference */
	/* use the file. */
	fdrop(fp, td);

-- 
John Baldwin