From owner-freebsd-current@FreeBSD.ORG Tue Jul 3 09:33:20 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3958C16A400; Tue, 3 Jul 2007 09:33:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0AC1213C45E; Tue, 3 Jul 2007 09:33:19 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 8A4D447871; Tue, 3 Jul 2007 05:33:19 -0400 (EDT) Date: Tue, 3 Jul 2007 10:33:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Suleiman Souhlal In-Reply-To: Message-ID: <20070703102859.K29272@fledge.watson.org> References: <20070701224452.I552@10.0.0.1> <20070701224741.M552@10.0.0.1> <3bbf2fe10707021518l3d4257d6o3b8838faa6d1ace5@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Attilio Rao , Jeff Roberson , current@freebsd.org Subject: Re: New SCHED_SMP diff. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2007 09:33:20 -0000 On Mon, 2 Jul 2007, Suleiman Souhlal wrote: >> I don't think here you need an atomic instruction, a memory barrier throug >> sfence is good enough in order to make thread migration consistent. > > SFENCE is not needed. Stores are already strongly ordered wrt other stores > on x86 (unless you use write-combining memory or non-temporal stores). The > main advantage of using an atomic operation when unlocking is that it should > make the store visible to other CPUs faster (so they don't spin as long), > although I think you'll have a hard time noticing a difference in a > macrobenchmark. FYI, in my experience the difference between using an atomic operation and a non-atomic operation to force out the release of a lock *is* measurable in a macrobenchmark. I measured a several percentage slowdown in buildworld time when I made the mutex release use a non-atomic operation, even though the cycle count for the release operation went up, which I put down to additional waiting time across several CPUs. Our kernel was quite different then -- much less fine-grained locking, such as Giant over VFS still -- and what is now several-year old SMP hardware. It would be very interesting to re-run micro- and macrobenchmarks to continue to reevaluate some of the decisions we made a few years ago on that generation of MP hardware and with the incomplete locking work now that MP hardware has changed a lot and our locking is much more mature. Robert N M Watson Computer Laboratory University of Cambridge