From owner-freebsd-hackers@freebsd.org  Mon Mar 13 18:39:01 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 13FFFD0A73D
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon, 13 Mar 2017 18:39:01 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: from mail-qt0-x241.google.com (mail-qt0-x241.google.com
 [IPv6:2607:f8b0:400d:c0d::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BEE991A10
 for <freebsd-hackers@freebsd.org>; Mon, 13 Mar 2017 18:39:00 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: by mail-qt0-x241.google.com with SMTP id n37so7330616qtb.3
 for <freebsd-hackers@freebsd.org>; Mon, 13 Mar 2017 11:39:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=Z2F4iHs+NKqetwYM8fPv68aL/VGDeBWs+Y6A6EFIPx4=;
 b=t9xkxcv92S1f3kqsivE7nQT2waiHRnKd0WfCDxAO0F8C77y9STH2qzsrFETt0posIl
 YNsHyz4vLJpPOLEqas+U8asvKUp4lL+PDBuErJ9lS1DuLoXw1+6C0+mLl10sPrQ2mFIg
 xKDvRiIfg3ZmwifrWePjLbb2uKfPZCigYhSnimAdoK+6jGUEHQJgsWdd+Qt55cnBe8od
 YIF8yZyDOlD8k+Sck7X9pu+APWGIJCB3Bvw9vEEUlbwWP4zOjL52fhZrfsXs7r5zrme/
 pmTQk1HzIL2QiYKVliPYiKMcejO+cnEtXNojQSeNlG9/O73YuhdG8YZKU544ig7UV8sk
 Bd0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :references:mime-version:content-disposition:in-reply-to:user-agent;
 bh=Z2F4iHs+NKqetwYM8fPv68aL/VGDeBWs+Y6A6EFIPx4=;
 b=MdlG/jGzcmsRbcdQcxC0kTGRzI5wetQDp0vfh2HNppkIx733B3FA/hcZxUBSDiBDeX
 3ylzr4Gd4cgEInsguAAdadGmmDq7YY+ihWW5qPC6LmvBvDMWC2XKjSY9ZGfF6obaJGe0
 y6ISgCwQ5GnE4lPt1+VFqAUB4kScztDYi0oC71KPgvup4hDAQM3VL+dy7MVpbo57l4jq
 3s5Ghwi1bIAtDmLnU7uglG75+zAiipxGrio8eAmrGWCDp5q2bI37MtcS4QRH0caWxkgi
 nQ811eWEaIEEWovzhCqelY99yVVdW4TKpC0A1HkY6znx0JuKtJPm2+x/UPMjgT3rdWQx
 OP5g==
X-Gm-Message-State: AMke39ksHaTIiMZJUpRfFqVV3lhUy/2n1Nx199gddNBvYYnYG4jflxiaKGKMeXGTSmDVJw==
X-Received: by 10.237.55.99 with SMTP id i90mr32836822qtb.262.1489430339808;
 Mon, 13 Mar 2017 11:38:59 -0700 (PDT)
Received: from wkstn-mjohnston.west.isilon.com
 (c-76-104-201-218.hsd1.wa.comcast.net. [76.104.201.218])
 by smtp.gmail.com with ESMTPSA id s69sm12716372qks.50.2017.03.13.11.38.58
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 13 Mar 2017 11:38:59 -0700 (PDT)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Mon, 13 Mar 2017 11:38:13 -0700
From: Mark Johnston <markj@FreeBSD.org>
To: Peter Holm <peter@holm.cc>
Cc: freebsd-hackers@FreeBSD.org
Subject: Re: draining high-frequency callouts
Message-ID: <20170313183813.GB57357@wkstn-mjohnston.west.isilon.com>
References: <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com>
 <20170313082120.GA44651@x2.osted.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170313082120.GA44651@x2.osted.lan>
User-Agent: Mutt/1.8.0 (2017-02-23)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 18:39:01 -0000

On Mon, Mar 13, 2017 at 09:21:20AM +0100, Peter Holm wrote:
> On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote:
> > I'm occasionally seeing an assertion failure in softclock_call_cc() when
> > running DTrace tests on a system with hz=10000. The assertion
> > (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
> > concurrently draining the callout, which runs at a high frequency. At
> > the time of the panic, that thread is spinning on the per-CPU callout
> > lock after having been awoken from "codrain", and CALLOUT_PENDING is
> > set on the callout. The callout is direct, i.e., it is executed in hard
> > interrupt context.
> > 
> > I think this is what's happening:
> > - callout_drain() is called while the callout is executing but after the
> >   callout has rescheduled itself, and goes to sleep after having cleared
> >   CALLOUT_ACTIVE.
> > - softclock_call_cc() wakes up the callout_drain() caller, but the
> >   callout fires again before the caller is scheduled.
> > - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
> >   cleared and panics.
> > 
> > Is there anything that prevents this scenario? Is it really correct to
> > leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
> > dropped in order to acquire a sleepqueue lock?
> > 
> 
> Is this the same problem?
> 
> panic: softclock_call_cc: act 0xfffff8000de64800 0

It's hard to say for sure. The minimal patch below fixed the problem for
me - could you give it a try? I also did not see any problems while
testing on Hans' branch.

diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c
index 5b70cf2033f5..a9c50fd98fbe 100644
--- a/sys/kern/kern_timeout.c
+++ b/sys/kern/kern_timeout.c
@@ -1256,7 +1256,8 @@ again:
 		 * Succeed we to stop it or not, we must clear the
 		 * active flag - this is what API users expect.
 		 */
-		c->c_flags &= ~CALLOUT_ACTIVE;
+		if ((flags & CS_DRAIN) == 0)
+			c->c_flags &= ~CALLOUT_ACTIVE;
 
 		if ((flags & CS_DRAIN) != 0) {
 			/*
@@ -1315,6 +1316,7 @@ again:
 				PICKUP_GIANT();
 				CC_LOCK(cc);
 			}
+			c->c_flags &= ~CALLOUT_ACTIVE;
 		} else if (use_lock &&
 			   !cc_exec_cancel(cc, direct) && (drain == NULL)) {