From owner-freebsd-current@FreeBSD.ORG Fri Oct 28 15:37:15 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB84D106566B for ; Fri, 28 Oct 2011 15:37:15 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 836AB8FC15 for ; Fri, 28 Oct 2011 15:37:15 +0000 (UTC) Received: by wwi18 with SMTP id 18so5794338wwi.31 for ; Fri, 28 Oct 2011 08:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=+z9tw7KxVwjgA4/6Zt7yXN4m8L6jY7iIXx5kXiyCvS0=; b=FuJUch5FVExDEOGvLmM6v6s5mQe0Kpb735jw2z4BKUV8Qdkok/NmrLvfdOjMtxLuJK k9+MMUAcovPWNn6DQDSEyoKZjZaUgJy51AMLNnxOuhto2p78OHdUxRcTjtgraqI1kCvY XOduDRtNjU3QHXzPtk3dDaecx/EKWlUh+bRMc= MIME-Version: 1.0 Received: by 10.227.202.143 with SMTP id fe15mr4458107wbb.25.1319816234415; Fri, 28 Oct 2011 08:37:14 -0700 (PDT) Received: by 10.180.8.34 with HTTP; Fri, 28 Oct 2011 08:37:14 -0700 (PDT) Date: Fri, 28 Oct 2011 11:37:14 -0400 Message-ID: From: Ryan Stone To: FreeBSD Current Content-Type: text/plain; charset=ISO-8859-1 Subject: smp_rendezvous runs with interrupts and preemption enabled on unicore systems X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 15:37:16 -0000 I'm seeing issues on a unicore systems running a derivative of FreeBSD 8.2-RELEASE if something calls mem_range_attr_set. It turns out that the root cause is a bug in smp_rendezvous_cpus. The first part of smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note that smp_started is never set to 1 on a unicore system): if (!smp_started) { if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); return; } The problem is that this runs with interrupts enabled, outside of a critical section. My system runs with device_polling enabled with hz set to 2500, so its quite easy to wedge the system by having a thread run mem_range_attr_set. That has to do a smp_rendezvous, and if a timer interrupt happens to go off half-way through the action_func and preempt this thread, the system ends up deadlocked(although once it's wedged, typing at the serial console stands a good chance of unwedging the system. Go figure). I know that smp_rendezvous was reworked substantially on HEAD, but by inspection it looks like the bug is still present, as the short-circuit behaviour is still there. I am not entirely sure of the best way to fix this. Is it as simple as doing a spinlock_enter before setup_func and a spinlock_exit after teardown_func? It seems to boot fine, but I'm not at all confident that I understand the nuances of smp_rendezvous to be sure that there aren't side effects that I don't know about.