From owner-freebsd-current  Mon Feb 11 20:41:28 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mail12.speakeasy.net (mail12.speakeasy.net [216.254.0.212])
	by hub.freebsd.org (Postfix) with ESMTP id 4C5D037B6B8
	for <current@freebsd.org>; Mon, 11 Feb 2002 20:13:04 -0800 (PST)
Received: (qmail 5479 invoked from network); 12 Feb 2002 04:12:58 -0000
Received: from unknown (HELO laptop.baldwin.cx) ([65.91.152.149]) (envelope-sender <jhb@FreeBSD.org>)
          by mail12.speakeasy.net (qmail-ldap-1.03) with SMTP
          for <julian@elischer.org>; 12 Feb 2002 04:12:58 -0000
Message-ID: <XFMail.020211231248.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.4.21.0202111759140.18316-100000@InterJet.elischer.org>
Date: Mon, 11 Feb 2002 23:12:48 -0500 (EST)
From: John Baldwin <jhb@FreeBSD.org>
To: Julian Elischer <julian@elischer.org>
Subject: Re: ucred holding patch, BDE version
Cc: Alfred Perlstein <bright@mu.org>, bde@freebsd.org,
	current@freebsd.org
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG


On 12-Feb-02 Julian Elischer wrote:
> 
> 
> On Mon, 11 Feb 2002, John Baldwin wrote:
> 
>> 
>> On 12-Feb-02 Julian Elischer wrote:
>> >  The proclock is needed to get the reference,
>> > guarding against other threads, and giant is needed fo rnot to free it
>> > because if it reaches a refcount of 0 it needs to call free(). (which john
>> > assures me needs Giant at this time).
>> > We could avoid the proclock with judicious use of an atomic refcount
>> > incrementing method.
>> 
>> _No_!  The proc lock is protecting p_ucred, it can't go away!  What _can_ go
>> away is the per-ucred mutex to protect the refcount if we ever revive the
>> refcount API.
> 
> hmm ok Alfred, here's the way I see it..

This is jhb, not alfred.

> You are never permitted to "CHANGE" a cred.
> You ALWAYS allocate another and switch them.
> When you switch them you decrement the refcount of the old one.
> If someone else takes a reference on it at the same moment that you 
> drop it, then the order is important down to the bus-cycle
> as to whether it gets freed or not. We already know that dereferencing it
> from the proc structure is not important, because a "stale" ucred pointer
> is only preventable from the userland. 

NO, no!  You only know it is stale in that case cause you are comparing a
non-stale known-good pointer to p_ucred.  If p_ucred is stale, then it is equal
to td_ucred which still points to a valid ucred so it is ok in that particular
case.  If p_ucred is stale but points to a new ucred, getting the proc lock
around crhold() ensures that you will gain a reference to the really correct
value that is guaranteed to point to a correct value.   Don't think in terms of
bus cycles.  On non-i386 architectures, a write can sit in a store buffer
waiting until it is pushed out by a memory barrier or for the CPU to get around
to posting it.  You can't assume that a write will be visible to other CPU's
"soon".

> All that is important  is that the pointer is a REAL pointer to a cred
> and that it is not allowed to go to 0 references in its way to 1
> reference.

But if it's a stale pointer, it's not a pointer to a ucred.  That memory could
have been free'd and now be a mbuf header.  Now you go try to increment the
refcount of a mbuf header.  Bad juju here. :)  Trust me, my jhb_proc kernels
were locking up cause I missed one instance of setting p_args to NULL during
exec so taht during wait we would trash random data memory when we tried to
indirect the stale p_args pointer to dec the refcount.

The problem is that you actually have 2 locks involved for 2 different things
here during your crhold()'s:

 - proc lock which protects p_ucred in two ways:
   - 1) it ensures that the ucred has at least one reference that doesn't go
        away so it's ok for us to deref the poitner and increment the count
   - 2) it ensures that the value of the pointer we read is up to date
 - ucred mutex which protects the ucred refcount

Only the second mutex can be replaced by pure atomic operations in theory.  The
first one cannot.  In your comparison test, you aren't dereferencing p_ucred,
so if it is stale, that is ok due to other issues.  However, when you do
crhold(p->p_ucred), you dereference the pointer, so it better darn well be
pointing at a ucred and not a mbuf.

> An atomic reference count increment that checks
> against 0 would be part of it but might not be enough.
> I think we also need to have a lock on something because
> it might get freed and used for something else that happens to place a "1"
> in that location inbetween the time that p->p_ucred is read
> and the refcount is decremented.

That is what the proc lock is doing. :-P
 
> As an asside:
> I think that in NT they may have refcounts in the same location in all
> structures as I think they derived all their structures from a bas class
> that has ref counts. In that case it WOULD have a "1" in that location if
> it were re-used. (It's been a long time since I saw NT code so I may be
> wrong)

That would involve great evil.  Let's not do that.  How do you know the
refcount hasn't been adjusted on some other structure anyways that you have a
stale pointer to?  Maybe that current structure is just now being freed?

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message