From owner-freebsd-current@freebsd.org Thu Sep 3 20:39:37 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35CAC9CA322 for ; Thu, 3 Sep 2015 20:39:37 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 52E7DC80; Thu, 3 Sep 2015 20:39:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA04869; Thu, 03 Sep 2015 23:39:33 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZXbID-0008eO-Bx; Thu, 03 Sep 2015 23:39:33 +0300 Subject: Re: panic: Stray timeout To: Hans Petter Selasky , Konstantin Belousov References: <55E2DA91.6080206@FreeBSD.org> <20150830161603.GI2072@kib.kiev.ua> <55E354DA.9030302@FreeBSD.org> <55E484F3.8090303@FreeBSD.org> <55E4A692.9010602@selasky.org> Cc: FreeBSD Current , rrs@FreeBSD.org From: Andriy Gapon Message-ID: <55E8AFB4.1080101@FreeBSD.org> Date: Thu, 3 Sep 2015 23:38:12 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55E4A692.9010602@selasky.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Sep 2015 20:39:37 -0000 On 31/08/2015 22:10, Hans Petter Selasky wrote: > Hi Andriy, > > On 08/31/15 18:46, Andriy Gapon wrote: >> On 30/08/2015 22:09, Andriy Gapon wrote: >>> On 30/08/2015 19:16, Konstantin Belousov wrote: >>>> This is strange, I do not think that could be a right explanation of this >>>> issue. The taskqueue callout is initialized with the mutex, which means >>>> that the callout_stop() caller >>>> - must own the mutex; >>>> - is synchronous with the callout. >>>> In other words, callout cannot be running when taskqueue_cancel_timeout() >>>> calls callout_stop(), it only can be dequeued but the callout function >>>> is not yet called. If callout_stop() is performed meantime, between >>>> dropping callout_cpu lock, and locking the mutex, it must be not run. >>> >>> Thank you for the explanation. I am not familiar with the code and I >>> misinterpreted the manual page and thought that callout_stop() might be unable >>> to stop the callout even if it was initialized with the mutex. I see my mistake >>> now. >> >> I had to look at the code, of course. >> Could the following logic actually be buggy? >> >> int >> callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t precision, >> void (*ftn)(void *), void *arg, int cpu, int flags) >> { >> ... >> if (cc_exec_curr(cc, direct) == c) { >> /* >> * We're being asked to reschedule a callout which is >> * currently in progress. If there is a lock then we >> * can cancel the callout if it has not really started. >> */ >> if (c->c_lock != NULL && cc_exec_cancel(cc, direct)) >> cancelled = cc_exec_cancel(cc, direct) = true; >> >> I have a suspicion that the condition should check for !cc_exec_cancel: >> - it seems that cc_exec_cancel is set to true when the callback execution starts >> - logically, it wouldn't make sense to check if X is true and then set it to true >> >> The code is quite hard to understand in a short time, so I could be >> misinterpreting what's going on, but I'd like to check my suspicion. > > While touching this topic, is the following code more readable or not (see the > callout_restart_async function): > > https://svnweb.freebsd.org/base/projects/hps_head/sys/kern/kern_timeout.c?revision=287248&view=markup That code is a bit easier to read indeed. But it's quite similar - unsurprisingly, I guess. >> If this is a bug indeed, then it might make the following scenario possible: >> >> - softclock() is being executed in thread T1 and the control flow is in >> softclock_call_cc() after CC_UNLOCK() and before lc_lock(c_lock) >> - thread T2 is executing callout_reset_sbt_on on the same callout as T1 >> - so, I think, T2 will see cc_exec_cancel == false and thus, because of the >> suspected bug, cc_exec_cancel will remain false >> - T2 will proceed to callout_cc_add() and will happily (re-)schedule the callout >> - after T2 releases the callout lock, T1 will continue in softclock_call_cc() >> and because cc_exec_cancel is false the callout's callback function will get >> called > > Correct. > >> >> In effect, there will be an extra call of the callout. >> If that's so that could explain the triggered assertion in >> taskqueue_timeout_func(). >> >> P.S. I am going to add an assertion in callout_cc_add that CALLOUT_ACTIVE is not >> set and I will see if I can get anything interesting from it. >> >> P.P.S. On a further look, it seems that there is a bug indeed and it's probably >> just a typo that got introduced in r278469: >> - if (c->c_lock != NULL && !cc->cc_exec_entity[direct].cc_cancel) >> - cancelled = cc->cc_exec_entity[direct].cc_cancel = true; >> - if (cc->cc_exec_entity[direct].cc_waiting) { >> + if (c->c_lock != NULL && cc_exec_cancel(cc, direct)) >> + cancelled = cc_exec_cancel(cc, direct) = true; >> + if (cc_exec_waiting(cc, direct)) { >> > > Looks like a bug to me trying to make the code more readable removed a not in > there! Yes, indeed. Thank you for reviewing! -- Andriy Gapon