From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 23:39:28 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4AC24410 for ; Thu, 2 Jan 2014 23:39:28 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 111821F39 for ; Thu, 2 Jan 2014 23:39:27 +0000 (UTC) Received: from [192.168.1.73] (254C510A.nat.pool.telekom.hu [37.76.81.10]) (authenticated bits=0) by vps1.elischer.org (8.14.7/8.14.7) with ESMTP id s02NdLM2019146 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 2 Jan 2014 15:39:25 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <52C5F8A3.9000902@freebsd.org> Date: Fri, 03 Jan 2014 00:39:15 +0100 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Alfred Perlstein , "Gumpula, Suresh" , "freebsd-hackers@freebsd.org" Subject: Re: Reference count race window References: <52C5ED3E.4020805@mu.org> In-Reply-To: <52C5ED3E.4020805@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 23:39:28 -0000 On 1/2/14, 11:50 PM, Alfred Perlstein wrote: > > On 1/2/14, 2:17 PM, Gumpula, Suresh wrote: >> Hi, >> I am Suresh from NetAPP and I have questions/queries related to >> the reference count usage in the BSD kernel. We are seeing some >> corruptions/use after free >> issues and while debugging we found that the corruption pattern >> is a ucred/crgroups structure and started looking at ucred >> reference count implemenation. >> >> This is my understanding of ref count race window, please correct >> me if I am wrong. >> >> >> It seems there is a timing window exposed by the FreeBSD reference >> count usage/implementation. Let's start with the definitions of the >> acquire and release routines >> in freebsd/sys/sys/refcount.h >> >> static __inline void >> refcount_acquire(volatile u_int *count) >> { >> >> atomic_add_acq_int(count, 1); >> } >> >> static __inline int >> refcount_release(volatile u_int *count) >> { >> u_int old; >> >> /* XXX: Should this have a rel membar? */ >> old = atomic_fetchadd_int(count, -1); >> KASSERT(old > 0, ("negative refcount %p", count)); >> return (old == 1); >> } >> >> As implemented, a call to refcount_acquire atomically increments >> the reference count while refcount_release decrements >> the reference count and returns true if this release dropped the >> reference count to zero. >> >> Consider the following sequence of events in the absence of other >> external synchronization: >> >> * Object foo has a refcount of 1 >> * Thread a on processor m calls refcount_release on foo. >> * Very soon after (in CPU terms) thread b on processor n calls >> refcount_acquire on foo. >> * atomic_fetchadd_int operating in thread a stalls the >> atomic_add_acq_int in thread b, >> decrementing foo's refcount to zero and setting old to 1. >> refcount_release returns true. >> * atomic_add_acq_int in thread b increments the reference count to 1! >> * thread a, seeing refcount_release return success, frees foo. >> * thread b, believing it has a reference count on foo, continues to >> use it. >> >> * The major hole here is that refcount_acquire is a void function. >> If it also returned status, >> calling software could determine that it had a valid reference >> and take appropriate action if it failed to acquire. >> >> >> One such implementation might look like: >> static __inline int >> refcount_acquire(volatile u_int *count) >> { >> u_int old; >> >> old = atomic_fetchadd_int(count, 1); >> return (old != 0); >> } >> >> This change would require modification of all calls to >> refcount_acquire and determining appropriate action in the case of >> a non-success return. >> >> >> Without changing the return-value semantics of refcount_acquire, we >> have introduced a panic if we detected a race as below. >> static __inline void >> refcount_acquire(volatile u_int *count) >> { >> u_int old; >> >> old = atomic_fetchadd_int(count, 1); >> if (old == 0) { >> panic("refcount_acquire race condition detected!\n"); >> } so what is the stacktrace of the panic? >> } >> >> After this change , we have seen this panic in one of our systems. >> Could someone look at my understanding and give me some ways to >> narrow down this problem. >> As I mentioned earlier, one option is to change refcount_acquire to >> be non void and change all the callers, but it seems there are many >> paths to be changed on failure case. >> >> >> Thank you >> Suresh >> >> _________________________ > Hey Suresh, > > In theory this shouldn't happen due to pointer/thread ownership of > the resource. > My memory is that the refcount infrastructure makes some assumptions about how it is called. and the cred code makes some assumptions about what is going on too. I do agree that there is a race as outlined by you, but I believe that it was suposed to be impossible to reach that due to the fact that creds were only actually released in special cases. In those cases we can guarantee that no-one else should be able to have a pointer to that cred as the pointer is supposed to be found after the locking of the appropriate proc/thread structure. Is it possible that you have changed the possible places that creds are released? it is possible that we ourselves have broken this. I have not looked at it for some years. Maybe it is time to change the way that the refcount interface is used here so that we do know if we succeeded in getting an only reference.. it would probably require recoding because there is always a legitimate place to get a reference count of 1 (the initial setup) and initial and subsequent acquisition of reference counts is often achieved with the same code. > This means that usually a cred is copied via refcount to another > object and by the time the refcount hits 1 then only that one object > should be pointing to it. > > That means that if someone is raising the refcount at the same time > then they are looking into an object that is in the process of being > destroyed! > > Going back to your example: > > * Object foo has a refcount of 1 > * Thread a on processor m calls refcount_release on foo. > * Very soon after (in CPU terms) thread b on processor n calls > refcount_acquire on foo. > > ^--- this should not be happening as "foo" should no longer be > accessible to other subsystems. > imagine this would be like some other CPU calling rfork() on a > process that is in the middle of exit(). This should *not* happen. to expand.. there are locks that are supposed to stop this from happening. Exit should not be able to proceed until it is sure that the proc structure is only accessed by itself, and the cred pointer should never be cached without a reference addition. Meaning that the count can only be 1 in this case when the lock has been held. If this has been changed then yes there is a bug.. you may try check the lock status of various locks when removing the last reference. > > * atomic_fetchadd_int operating in thread a stalls the > atomic_add_acq_int in thread b, > decrementing foo's refcount to zero and setting old to 1. > refcount_release returns true. > * atomic_add_acq_int in thread b increments the reference count to 1! > * thread a, seeing refcount_release return success, frees foo. > * thread b, believing it has a reference count on foo, continues to > use it. > > > While it's possible that there *may* be a bug here, I think it would > make sense for you to add more instrumentation to your code. > > Are you testing with INVARIANTS enabled? otherwise refcount_release > should be panic'ing due to the KASSERT! > > static __inline int > refcount_release(volatile u_int *count) > { > u_int old; > > /* XXX: Should this have a rel membar? */ > old = atomic_fetchadd_int(count, -1); > KASSERT(old > 0, ("negative refcount %p", count)); > return (old == 1); > } > > Perhaps you should either enable INVARIANTS... or you can turn that > one single KASSERT into an unconditional test like so: > > > static __inline int > refcount_release(volatile u_int *count) > { > u_int old; > > /* XXX: Should this have a rel membar? */ > old = atomic_fetchadd_int(count, -1); > if (old < 0) panic("negative refcount %p", count); > return (old == 1); > } > > That ought to help you catch the bug. > > -Alfred > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" > >