From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 17 11:21:02 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id F1BF092A;
 Wed, 17 Oct 2012 11:21:01 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 122A08FC0C;
 Wed, 17 Oct 2012 11:21:00 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA04290;
 Wed, 17 Oct 2012 14:20:58 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1TORgI-000Pc7-Do; Wed, 17 Oct 2012 14:20:58 +0300
Message-ID: <507E9498.10905@FreeBSD.org>
Date: Wed, 17 Oct 2012 14:20:56 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:16.0) Gecko/20121013 Thunderbird/16.0.1
MIME-Version: 1.0
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
Subject: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr?
X-Enigmail-Version: 1.4.5
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 11:21:02 -0000


_mtx_lock_spin has the following check in its retry loop:
if (i < 60000000 || kdb_active || panicstr != NULL)
        DELAY(1);
else
        _mtx_lock_spin_failed(m);

Which means that in the (!kdb_active && panicstr == NULL) case we will make at
most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to
panic).  When either kdb_active or panicstr is set, then we are going to loop
forever.

I've done some digging through the lengthy history and many evolutions of the
code (unfortunately I haven't kept records during the research), and my
conclusion is that the kdb_active and panicstr checks were added at the quite
early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other
CPUs when kdb or panic were entered.  We didn't even prevent parallel execution
of panic.
So the above code was a sort of defense where we hoped that "other" CPUs would
eventually stumble upon some held spinlock and would be captured there.  Maybe
there was a specific set of spinlocks, which were supposed to help.

Nowadays, we do try to stop other CPUs during panic and kdb activation and there
are good chances that they are indeed stopped.  In this circumstances, should
the main CPU be so unlucky as to run into the held spinlock, the above check
would do more harm than good - the main CPU would just spin there forever,
because a lock owner is also spinning in the stop loop and so can't release the
lock.
Actually, this is only true for the kdb case.  In the panic case we make a check
earlier and simply ignore/skip/bust all the locks.  That makes the panicstr
check in the code in question both harmless and useless.

So I'd like to propose to remove those checks altogether.  Or perhaps to
"reverse" them and immediately induce a (possibly secondary) panic if we ever
get to that wait loop and kdb_active || panicstr != NULL.

What do you think?
-- 
Andriy Gapon