From owner-freebsd-arch@FreeBSD.ORG Thu May 21 09:37:48 2009 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B7401065670; Thu, 21 May 2009 09:37:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id 9AC968FC13; Thu, 21 May 2009 09:37:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-107-117-19.carlnfd1.nsw.optusnet.com.au (c122-107-117-19.carlnfd1.nsw.optusnet.com.au [122.107.117.19]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n4L9bFZ9013958 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 May 2009 19:37:26 +1000 Date: Thu, 21 May 2009 19:37:09 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: John Baldwin In-Reply-To: <200905201524.49090.jhb@freebsd.org> Message-ID: <20090521180328.W21310@delplex.bde.org> References: <20090514131613.T1224@besplex.bde.org> <200905201524.49090.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= , arch@FreeBSD.org Subject: Re: lockless file descriptor lookup X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 09:37:48 -0000 On Wed, 20 May 2009, John Baldwin wrote: > On Wednesday 20 May 2009 2:59:52 pm Jeff Roberson wrote: >> On Thu, 14 May 2009, Bruce Evans wrote: >>> Anyway, you probably need atomics that have suitable memory barriers. >>> Memory barriers must affect the compiler and make it perform refreshes >>> for them to work, so you shouldn't need any volatile casts. E.g., all >>> atomic store operations (including cmpset) have release semantics even >>> if they aren't spelled with "_rel" or implemented using inline asm. >>> On amd64 and i386, they happen to be implemented using inline asm with >>> "memory" clobbers. The "memory" clobbers force refreshes of all >>> non-local variables. Actually, it is the "acquire" operations that happen to be implemented with "memory" clobbers on amd64 and i386. "release" semantics are (completely?) automatic on amd64 and i386 so no "memory" clobbers are used for them (except IIRC in old versions). >> So I think I need an _acq memory barrier on the atomic cmpset of the >> refcount to prevent speculative loading of the fd_ofiles array pointer by >> the processor and the volatile in the second dereference as I have it >> now to prevent caching of the pointer by the compiler. What do you think? I thought that it was a _rel barrier that was needed due to my misreading of the "memory" clobbers corrected above. Perhaps both _acq and _rel are needed in cases like yours where a single cmpset corresponds to a (lock, unlock) pair. On amd64 and i386, plain atomic_cmpset already has both (_acq via the explicit "memory" clobber, and _rel implicitly), but the man page doesn't say that this is generic. It only says that all stores have _rel semantics, and it uses an explicit _aqu suffixes in examples of how to use cmpset to implement locking (the examples are rotted copies of locking in sys/mutex.h). Since a successful plain cmpset does a store, this implicitly says that plain cmpset's have _rel semantics and cmpset_acq has both _acq and _rel semantics. Mutex locking has always been careful to use an explicit _acq suffix, but most code in /sys isn't. In a /sys tree deated ~March 30, there are 280 lines matching atomic_cmpset but only 72 lines matching atomic_cmpset_acq and 47 lines matching atomic_cmpset_rel. Excluding the implementation (atomic.h), there are 153 lines matching atomic_cmpset, 35 matching atomic_cmpset_acq and 12 matching atomic_cmpset_rel; this gives 106 lines that are probably missing an _acq or a _rel suffix. No one replied to my previous mails about this. I would require explicit suffix by not supporting plain cmpset, or not support the _rel suffix for stores since because stores are always _rel, it is hard to tell if an atomic store without the suffix really wants non-_rel or is sloppy. Despite the proliferation of interfaces, there is no _acq_rel suffix to indicate that cmpset_acq is also _rel. >> The references prior to the atomic increment have no real ordering >> requirements. Only the ones afterwards need to be strict so that we can >> verify the results. Most references are in a loop, so "before" and "after" are sort of the saeme: % for (;;) { % fp = fdp->fd_ofiles[fd]; % if (fp == NULL) % break; % count = fp->f_count; % if (count == 0) % continue; % if (atomic_cmpset_int(&fp->f_count, count, count + 1) != 1) % continue; I think we do depend on both _acq and _rel semantics here -- the missing _acq to volatilize everything, and the implicit _rel just (?) to force the memory copy of f_count to actually be incremented, as is required for an atomic store to actually work. % if (fp == ((struct file *volatile*)fdp->fd_ofiles)[fd]) % break; The RHS here could be used again at the top of the loop. The load for the RHS is ordered after the cmpset, and so is the one at the top of the loop, except for the first iteration. I think this is unimportant. % fdrop(fp, curthread); % } > > I think having the _acq is correct and that the "memory" clobber it contains > will force the compiler to reload fd_ofiles without needing the volatile cast > (and thus that you can remove the volatile cast altogether and just add the > _acq barrier). I agree. Please look at whether some of the ~106 other plain cmpset's need and _acq prefix or should have a _rel prefix for clarity. You should be able to do this much faster than me, having written some of them :-). E.g., the one in sio.c is for implementing a lock so it shuld use _acq (though it might work without _acq since the lock is only used once), but the ones in sx.h and kern_sx.c might be correct since they are mostly for "trylock"-type operations. Bruce