From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 11:21:02 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F1BF092A; Wed, 17 Oct 2012 11:21:01 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 122A08FC0C; Wed, 17 Oct 2012 11:21:00 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA04290; Wed, 17 Oct 2012 14:20:58 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TORgI-000Pc7-Do; Wed, 17 Oct 2012 14:20:58 +0300 Message-ID: <507E9498.10905@FreeBSD.org> Date: Wed, 17 Oct 2012 14:20:56 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121013 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-hackers Subject: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr? X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Bruce Evans X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 11:21:02 -0000 _mtx_lock_spin has the following check in its retry loop: if (i < 60000000 || kdb_active || panicstr != NULL) DELAY(1); else _mtx_lock_spin_failed(m); Which means that in the (!kdb_active && panicstr == NULL) case we will make at most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to panic). When either kdb_active or panicstr is set, then we are going to loop forever. I've done some digging through the lengthy history and many evolutions of the code (unfortunately I haven't kept records during the research), and my conclusion is that the kdb_active and panicstr checks were added at the quite early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other CPUs when kdb or panic were entered. We didn't even prevent parallel execution of panic. So the above code was a sort of defense where we hoped that "other" CPUs would eventually stumble upon some held spinlock and would be captured there. Maybe there was a specific set of spinlocks, which were supposed to help. Nowadays, we do try to stop other CPUs during panic and kdb activation and there are good chances that they are indeed stopped. In this circumstances, should the main CPU be so unlucky as to run into the held spinlock, the above check would do more harm than good - the main CPU would just spin there forever, because a lock owner is also spinning in the stop loop and so can't release the lock. Actually, this is only true for the kdb case. In the panic case we make a check earlier and simply ignore/skip/bust all the locks. That makes the panicstr check in the code in question both harmless and useless. So I'd like to propose to remove those checks altogether. Or perhaps to "reverse" them and immediately induce a (possibly secondary) panic if we ever get to that wait loop and kdb_active || panicstr != NULL. What do you think? -- Andriy Gapon