From owner-freebsd-current@freebsd.org Mon Aug 31 16:47:50 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E18E9C72CE for ; Mon, 31 Aug 2015 16:47:50 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7E8B31EF6; Mon, 31 Aug 2015 16:47:49 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA01118; Mon, 31 Aug 2015 19:47:41 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZWSFB-0003xq-HC; Mon, 31 Aug 2015 19:47:41 +0300 Subject: Re: panic: Stray timeout To: Konstantin Belousov References: <55E2DA91.6080206@FreeBSD.org> <20150830161603.GI2072@kib.kiev.ua> <55E354DA.9030302@FreeBSD.org> Cc: FreeBSD Current , rrs@FreeBSD.org From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <55E484F3.8090303@FreeBSD.org> Date: Mon, 31 Aug 2015 19:46:43 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55E354DA.9030302@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 16:47:50 -0000 On 30/08/2015 22:09, Andriy Gapon wrote: > On 30/08/2015 19:16, Konstantin Belousov wrote: >> This is strange, I do not think that could be a right explanation of this >> issue. The taskqueue callout is initialized with the mutex, which means >> that the callout_stop() caller >> - must own the mutex; >> - is synchronous with the callout. >> In other words, callout cannot be running when taskqueue_cancel_timeout() >> calls callout_stop(), it only can be dequeued but the callout function >> is not yet called. If callout_stop() is performed meantime, between >> dropping callout_cpu lock, and locking the mutex, it must be not run. > > Thank you for the explanation. I am not familiar with the code and I > misinterpreted the manual page and thought that callout_stop() might be unable > to stop the callout even if it was initialized with the mutex. I see my mistake > now. I had to look at the code, of course. Could the following logic actually be buggy? int callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t precision, void (*ftn)(void *), void *arg, int cpu, int flags) { ... if (cc_exec_curr(cc, direct) == c) { /* * We're being asked to reschedule a callout which is * currently in progress. If there is a lock then we * can cancel the callout if it has not really started. */ if (c->c_lock != NULL && cc_exec_cancel(cc, direct)) cancelled = cc_exec_cancel(cc, direct) = true; I have a suspicion that the condition should check for !cc_exec_cancel: - it seems that cc_exec_cancel is set to true when the callback execution starts - logically, it wouldn't make sense to check if X is true and then set it to true The code is quite hard to understand in a short time, so I could be misinterpreting what's going on, but I'd like to check my suspicion. If this is a bug indeed, then it might make the following scenario possible: - softclock() is being executed in thread T1 and the control flow is in softclock_call_cc() after CC_UNLOCK() and before lc_lock(c_lock) - thread T2 is executing callout_reset_sbt_on on the same callout as T1 - so, I think, T2 will see cc_exec_cancel == false and thus, because of the suspected bug, cc_exec_cancel will remain false - T2 will proceed to callout_cc_add() and will happily (re-)schedule the callout - after T2 releases the callout lock, T1 will continue in softclock_call_cc() and because cc_exec_cancel is false the callout's callback function will get called In effect, there will be an extra call of the callout. If that's so that could explain the triggered assertion in taskqueue_timeout_func(). P.S. I am going to add an assertion in callout_cc_add that CALLOUT_ACTIVE is not set and I will see if I can get anything interesting from it. P.P.S. On a further look, it seems that there is a bug indeed and it's probably just a typo that got introduced in r278469: - if (c->c_lock != NULL && !cc->cc_exec_entity[direct].cc_cancel) - cancelled = cc->cc_exec_entity[direct].cc_cancel = true; - if (cc->cc_exec_entity[direct].cc_waiting) { + if (c->c_lock != NULL && cc_exec_cancel(cc, direct)) + cancelled = cc_exec_cancel(cc, direct) = true; + if (cc_exec_waiting(cc, direct)) { -- Andriy Gapon