From owner-freebsd-arch@FreeBSD.ORG Tue Jun 5 18:51:37 2007 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1127016A46E; Tue, 5 Jun 2007 18:51:37 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id AF5AC13C46A; Tue, 5 Jun 2007 18:51:36 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net [71.231.138.78]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l55IpXTY070007 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Tue, 5 Jun 2007 14:51:34 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Tue, 5 Jun 2007 11:51:18 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: John Baldwin In-Reply-To: <200706051012.18864.jhb@freebsd.org> Message-ID: <20070605114745.I606@10.0.0.1> References: <20070604220649.E606@10.0.0.1> <200706051012.18864.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 05 Jun 2007 18:57:35 +0000 Cc: marcel@freebsd.org, kmacy@freebsd.org, benno@freebsd.org, marius@freebsd.org, arch@freebsd.org, jake@freebsd.org, freebsd-arch@freebsd.org, tmm@freebsd.org, cognet@freebsd.org, grehan@freebsd.org Subject: Re: New cpu_switch() and cpu_throw(). X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jun 2007 18:51:37 -0000 On Tue, 5 Jun 2007, John Baldwin wrote: > On Tuesday 05 June 2007 01:32:46 am Jeff Roberson wrote: >> For every architecture we need to support a new features in cpu_switch() >> and cpu_throw() before they can support per-cpu schedlock. I'll describe >> those below. I'm soliciting help or advice in implementing these on >> platforms other than x86, and amd64, especially on ia64 where things are >> implemented in C! >> >> I checked in the new version of cpu_switch() for amd64 today after >> threadlock went in. Basically, we have to release a thread's lock when >> it's switched out and acquire a lock when it's switched in. >> >> The release must happen after we're totally done with the stack and >> vmspace of the thread to be switched out. On amd64 this meant after we >> clear the active bits for tlb shootdown. The release actually makes use >> of a new 'mtx' argument to cpu_switch() and sets the td_lock pointer to >> this argument rather than unlocking a real lock. td_lock has previously >> been set to the blocked lock, which is always blocked. Threads >> spinning in thread_lock() will notice the td_lock pointer change and >> acquire the new lock. So this is simple, just a non-atomic store with a >> pointer passed as an argument. On amd64: >> >> movq %rdx, TD_LOCK(%rdi) /* Release the old thread */ >> >> The acquire part is slightly more complicated and involves a little loop. >> We don't actually have to spin trying to lock the thread. We just spin >> until it's no longer set to the blocked lock. The switching thread >> already owns the per-cpu scheduler lock for the current cpu. If we're >> switching into a thread that is set to the blocked_lock another cpu is >> about to set it to our current cpu's lock via the mtx argument mentioned >> above. On amd64 we have: >> >> /* Wait for the new thread to become unblocked */ >> movq $blocked_lock, %rdx >> 1: >> movq TD_LOCK(%rsi),%rcx >> cmpq %rcx, %rdx >> je 1b > > If this is to handle a thread migrating from one CPU to the next (and there's > no interlock to control migration, otherwise you wouldn't have to spin here) > then you will need memory barriers on the first write (i.e. the first write > above should be an atomic_store_rel()) and the equivalent of an _acq barrier > here. So, thanks for pointing this out. Attilio also mentions that on x86 and amd64 we need a pause in the wait loop. As we discussed, we can just use sfence rather than atomics on amd64, however, x86 will need atomics since you can't rely on the presence of *fence. Other architectures will have to ensure memory ordering as appropriate. Jeff > > -- > John Baldwin >