From owner-svn-src-all@FreeBSD.ORG Sat Jan 30 14:40:39 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF63D106566C; Sat, 30 Jan 2010 14:40:39 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 1B60D8FC08; Sat, 30 Jan 2010 14:40:38 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.3/8.14.3/ALCHEMY.FRANKEN.DE) with ESMTP id o0UEeaiI098983; Sat, 30 Jan 2010 15:40:37 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.3/8.14.3/Submit) id o0UEeauP098982; Sat, 30 Jan 2010 15:40:36 +0100 (CET) (envelope-from marius) Date: Sat, 30 Jan 2010 15:40:36 +0100 From: Marius Strobl To: Attilio Rao Message-ID: <20100130144036.GA77522@alchemy.franken.de> References: <201001231554.o0NFsMbx049837@svn.freebsd.org> <3bbf2fe11001252310r408a6be4j9bc42618394b3e3d@mail.gmail.com> <20100127215904.GF40779@alchemy.franken.de> <3bbf2fe11001280216p705ed94ev61abc4be654f8cc1@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3bbf2fe11001280216p705ed94ev61abc4be654f8cc1@mail.gmail.com> User-Agent: Mutt/1.4.2.3i Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Rob Farmer , src-committers@freebsd.org Subject: Re: svn commit: r202889 - head/sys/kern X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 14:40:39 -0000 On Thu, Jan 28, 2010 at 11:16:55AM +0100, Attilio Rao wrote: > 2010/1/27 Marius Strobl : > > On Tue, Jan 26, 2010 at 08:10:25AM +0100, Attilio Rao wrote: > >> 2010/1/26 Rob Farmer : > >> > On Sat, Jan 23, 2010 at 7:54 AM, Attilio Rao wrote: > >> >> Author: attilio > >> >> Date: Sat Jan 23 15:54:21 2010 > >> >> New Revision: 202889 > >> >> URL: http://svn.freebsd.org/changeset/base/202889 > >> >> > >> >> Log: > >> >>  - Fix a race in sched_switch() of sched_4bsd. > >> >>    In the case of the thread being on a sleepqueue or a turnstile, the > >> >>    sched_lock was acquired (without the aid of the td_lock interface) and > >> >>    the td_lock was dropped. This was going to break locking rules on other > >> >>    threads willing to access to the thread (via the td_lock interface) and > >> >>    modify his flags (allowed as long as the container lock was different > >> >>    by the one used in sched_switch). > >> >>    In order to prevent this situation, while sched_lock is acquired there > >> >>    the td_lock gets blocked. [0] > >> >>  - Merge the ULE's internal function thread_block_switch() into the global > >> >>    thread_lock_block() and make the former semantic as the default for > >> >>    thread_lock_block(). This means that thread_lock_block() will not > >> >>    disable interrupts when called (and consequently thread_unlock_block() > >> >>    will not re-enabled them when called). This should be done manually > >> >>    when necessary. > >> >>    Note, however, that ULE's thread_unblock_switch() is not reaped > >> >>    because it does reflect a difference in semantic due in ULE (the > >> >>    td_lock may not be necessarilly still blocked_lock when calling this). > >> >>    While asymmetric, it does describe a remarkable difference in semantic > >> >>    that is good to keep in mind. > >> >> > >> >>  [0] Reported by:      Kohji Okuno > >> >>                         > >> >>  Tested by:            Giovanni Trematerra > >> >>                         > >> >>  MFC:                  2 weeks > >> >> > >> >> Modified: > >> >>  head/sys/kern/kern_mutex.c > >> >>  head/sys/kern/sched_4bsd.c > >> >>  head/sys/kern/sched_ule.c > >> > > >> > Hi, > >> > > >> > This commit seems to be causing me a kernel panic on sparc64 - details > >> > are in PR 143215. Could you take a look before MFCing this? > >> > >> I think that the bug may be in cpu_switch() where the mutex parameter > >> for sched_4bsd is not handled correctly. > >> Does sparc64 support ULE? I don't think it does and I think that it > >> simply ignores the third argument of cpu_switch() which is vital now > >> for for sched_4bsd too (what needs to happen is to take the passed > >> mutex and to set the TD_LOCK of old thread to be the third argument). > >> Unluckilly, I can't do that in sparc64 asm right now, but it should > >> not be too difficult to cope with it. > >> > > > > The following patch adds handling of the mutex parameter to the > > sparc64 cpu_switch(): > > http://people.freebsd.org/~marius/sparc64_cpu_switch_mtx.diff > > This patch works fine with r202888. With r202889 it allows the > > machine to boot again, however putting some load on the machine > > causes it to issue a reset without a chance to debug. I've also > > tried with some variations like duplicating the old cpu_switch() > > for cpu_throw() so the altered cpu_switch() doesn't need to > > distinguish between the to cases and only assigning old->td_lock > > right before return but nothing made a difference. Given that > > this leaves little room for a bug in the cpu_switch() changes I > > suspect r202889 also breaks additional assumptions. For example > > the sparc64 pmap code used sched_lock, does that need to change > > to be td_lock now maybe? Is there anything else that comes to > > your mind in this regard? > > Sorry for being lame with sparc64 assembly (so that I can't make much > more productive help here), but the required patch, sched_4bsd only, > should simply save the extra-argument of cpu_switch() (and cpu_throw() > is not involved, so I'm not sure what is changing there) and move in > TD_LOCK(%oldthreadreg) when it is safe to do (just after the oldthread > switched out completely). It doesn't even require a memory barrier. > This patch seems a bit too big and I wonder what else it does (or I'm > entirely wrong and that's just what I asked here), maybe adding the > ULE support as well? Actually it just adds old->td_lock = mtx in a non-atomic fashion as soon as we're done with the old thread. It's "big" as I had to reshuffle the register usage in order to preserve %i0 (old) and %i2 (mtx) and in order to distinguish between cpu_switch() and cpu_throw() (no mtx and old maybe be NULL in that case). As it turns out it also works just fine, the problems I were seeing were due to another change in that tree. Sorry for the noise. My understanding is that for ULE, mtx should be assigned to old->td_lock atomically, is that correct? > > Said that, all the code, including MD parts should always use > td_lock() and not doing explicit acquisitions/drops of sched_lock, if > they want to support ULE (but probabilly even if they do not want), > unless there is a big compelling reason (that I expect to be justified > in comments at least). I think the idea behind using sched_lock in the sparc64 code is to piggyback on it for protecting the pmap and take advantage of the fact that it's held across cpu_switch() anyway. If that's correct it should be possible to replace it with a separate spinlock dedicated to protecting the pmap or given that due to the macro madness involved in mtx_{,un}lock_spin() it's hard to properly call these from asm by calling spinlock_{enter,exit}() directly. Marius