From owner-freebsd-net@freebsd.org Sat Dec 12 10:12:31 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA04BA14AC4 for ; Sat, 12 Dec 2015 10:12:31 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9DC691F73; Sat, 12 Dec 2015 10:12:30 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id A76B61FE023; Sat, 12 Dec 2015 11:12:15 +0100 (CET) Subject: Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937] To: Randall Stewart References: null <2739461446298483@web2h.yandex.ru> <566A94A1.60400@selasky.org> <2850091449828775@web21o.yandex.ru> <566AB081.8050100@selasky.org> <566ABDAF.7060208@selasky.org> <91332B46-8CD3-45C0-80D0-AAD29ADD2DE0@netflix.com> Cc: "Alexander V. Chernikov" , Adrian Chadd , freebsd-net , Gleb Smirnoff From: Hans Petter Selasky Message-ID: <566BF36D.5060702@selasky.org> Date: Sat, 12 Dec 2015 11:14:05 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <91332B46-8CD3-45C0-80D0-AAD29ADD2DE0@netflix.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Dec 2015 10:12:31 -0000 On 12/12/15 00:26, Randall Stewart wrote: > Hans: > > After talking with Gleb he tells me part of your test is to kldunload a module. > > Now I think that is the source of the problem. > > Probably the cleanup code failed to stop the timer and did the remove.. thus > when the timer expires it blows up. > > This is not a callout issue.. I think you need to start looking at the cleanup if you > want to pursue this. Randall: Our driver uses a pause of hz ticks to ensure resources are not used any more, which on a fast machine might give exactly hz ticks between ifattach and ifdetach. Is this a problem? What about tunX and tapX devices? In think the right way to ensure races go away is to use Glebs initial approach, because then there is no need to have a check for LLE_LINKED, hence the callback is protected by a mutex, and will be atomically stopped? And use callout_async_drain() when when freeing lle's. Like you write in your previous e-mail, the value of callout_pending() can change during the execution of the arptimer function, and even after the last unlock in arptimer. --HPS