From owner-freebsd-current@FreeBSD.ORG Fri Oct 28 17:33:29 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC6A2106566B; Fri, 28 Oct 2011 17:33:28 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-pz0-f44.google.com (mail-pz0-f44.google.com [209.85.210.44]) by mx1.freebsd.org (Postfix) with ESMTP id BF0898FC14; Fri, 28 Oct 2011 17:33:28 +0000 (UTC) Received: by pzk4 with SMTP id 4so21743978pzk.3 for ; Fri, 28 Oct 2011 10:33:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=u8MjrBv1XounJcu57HY42CCdGyXVh12eu3KkcN4+pKI=; b=BRuqTcDnGpL+QmfWTFr6U2OnDmE2yuCgDL/jchWg6sBbe+LdRd+P3EOmafTVA49dJl 94Y1z1r/4nNRU2MPu6fl4e9qyD5dvq2e7TEtlOld3PvBppZv4L1vco43aNTj5otQsWEV iFVZ9Tu+FmU9UFofbHmiJfAzK7GPgbAggwjUg= MIME-Version: 1.0 Received: by 10.68.38.41 with SMTP id d9mr5068922pbk.103.1319821633394; Fri, 28 Oct 2011 10:07:13 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.68.57.40 with HTTP; Fri, 28 Oct 2011 10:07:13 -0700 (PDT) In-Reply-To: References: Date: Fri, 28 Oct 2011 10:07:13 -0700 X-Google-Sender-Auth: 3BnhH3vVtWvMmTqw3tU9PEYMNO8 Message-ID: From: mdf@FreeBSD.org To: Ryan Stone Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD Current , mlaier@freebsd.org Subject: Re: smp_rendezvous runs with interrupts and preemption enabled on unicore systems X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 17:33:29 -0000 On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone wrote: > I'm seeing issues on a unicore systems running a derivative of FreeBSD > 8.2-RELEASE if something calls mem_range_attr_set. =A0It turns out that > the root cause is a bug in smp_rendezvous_cpus. =A0The first part of > smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note > that smp_started is never set to 1 on a unicore system): > > =A0 =A0 =A0 =A0if (!smp_started) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (setup_func !=3D NULL) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0setup_func(arg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (action_func !=3D NULL) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0action_func(arg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (teardown_func !=3D NULL) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0teardown_func(arg); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return; > =A0 =A0 =A0 =A0} > > The problem is that this runs with interrupts enabled, outside of a > critical section. =A0My system runs with device_polling enabled with hz > set to 2500, so its quite easy to wedge the system by having a thread > run mem_range_attr_set. =A0That has to do a smp_rendezvous, and if a > timer interrupt happens to go off half-way through the action_func and > preempt this thread, the system ends up deadlocked(although once it's > wedged, typing at the serial console stands a good chance of unwedging > the system. =A0Go figure). > > I know that smp_rendezvous was reworked substantially on HEAD, but by > inspection it looks like the bug is still present, as the > short-circuit behaviour is still there. > > I am not entirely sure of the best way to fix this. =A0Is it as simple > as doing a spinlock_enter before setup_func and a spinlock_exit after > teardown_func? =A0It seems to boot fine, but I'm not at all confident > that I understand the nuances of smp_rendezvous to be sure that there > aren't side effects that I don't know about. Looks like Max didn't have time to upstream this fix: struct mtx smp_ipi_mtx; +MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN); ... static void mp_start(void *dummy) { - mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN); ... if (!smp_started) { + mtx_lock_spin(&smp_ipi_mtx); if (setup_func !=3D NULL) setup_func(arg); if (action_func !=3D NULL) action_func(arg); if (teardown_func !=3D NULL) teardown_func(arg); + mtx_unlock_spin(&smp_ipi_mtx); return; } Cheers, matthew