From owner-freebsd-net@FreeBSD.ORG Thu Feb 5 17:11:48 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1D5D2C44 for ; Thu, 5 Feb 2015 17:11:48 +0000 (UTC) Received: from phabric-backend.isc.freebsd.org (phabric-backend.isc.freebsd.org [IPv6:2001:4f8:3:ffe0:406a:0:50:2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ED285A7F for ; Thu, 5 Feb 2015 17:11:47 +0000 (UTC) Received: from phabric-backend.isc.freebsd.org (phabric-backend.isc.freebsd.org [127.0.1.5]) by phabric-backend.isc.freebsd.org (8.14.9/8.14.9) with ESMTP id t15HBlaU067009 for ; Thu, 5 Feb 2015 17:11:47 GMT (envelope-from root@phabric-backend.isc.freebsd.org) Received: (from root@localhost) by phabric-backend.isc.freebsd.org (8.14.9/8.14.9/Submit) id t15HBlvL067006; Thu, 5 Feb 2015 17:11:47 GMT (envelope-from root) Date: Thu, 5 Feb 2015 17:11:47 +0000 To: freebsd-net@freebsd.org From: "rrs (Randall Stewart)" Subject: [Differential] [Commented On] D1777: Associated fix for arp/nd6 timer usage. Message-ID: <21a6661d2c797185ab16919d528b2b03@localhost.localdomain> X-Priority: 3 Thread-Topic: D1777: Associated fix for arp/nd6 timer usage. X-Herald-Rules: none X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-To: X-Phabricator-Cc: X-Phabricator-Cc: X-Phabricator-Cc: X-Phabricator-Cc: X-Phabricator-Cc: X-Phabricator-Cc: In-Reply-To: References: Thread-Index: N2Y2Y2VmY2ZjNTc1MTM4NTA3YmIzZDk3NmE4IFTTpFM= X-Phabricator-Sent-This-Message: Yes X-Mail-Transport-Agent: MetaMTA X-Auto-Response-Suppress: All X-Phabricator-Mail-Tags: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Feb 2015 17:11:48 -0000 rrs added a comment. Jhb/Others So lets go through your scenario with code in arp: a) softclock dequeues callout to run -- Which calls softclock_call_cc We make it to line:676 and see that "yes" the user (arp) init'd with a rw_mtx and run the next line 677 (to get the lock). b) other thread grabs lle_lock -- Our other thread is a call in to flush the table in net/if_llatble.c the lucky winner is line 181. c) softclock blocks on lle_wlock above -- from the call to line 677 right. d) other thread tears down structure, unlocks lock, zeros memory, 0xdeadc0de, etc. -- Now here is where its interesting, the other thread does if (callout_stop(&lle->la_timer)) LLE_REMREF(lle); lleentry_free llentry_free is going to do: - remove the entry from the lists - and in the end call LLE_FREE_LOCKED() - LLE_FREE_LOCKED() is a macro that checks if (lle->lle_refcnt == 1) call free function else { LLE_REMREF(lle) LLE_WUNLOCK() } Since we are a "stoppable callout" and the callout has a lock, it will fall through (even though its not safe) and return 1, yes the callout was stopped. So we hit the call free function which in this case in_lltable_free which does: LLE_WUNLOCK(lle) LLE_LOK_DESTROY(lle) free(lle, M_LLTABLE) e) softclock wakes up in mutex code and panics becuase the mutex is destroyed and it either triggers an assertion, follows a bad pointer trying to propagate priority or see if the "owner" is running, etc. -- And we wake up and boom. However, with the change from callout_init_rw(c,..) -> callout_init(c, 1) things change. Instead you get a 0 return from callout_stop, since the callout can *not* be stopped at this point. We don't do that first reference lower so we hit the else case in llentry_free which just reduces the count to 1 and unlocks and returns. Now our callout proceeds, getting the lock and it will then go through and check only the pending flag (the active has been removed by the stop but we don't care). There we now do the free and all is well. That is how this fix avoids the issue. Would it be better to have callout_async_drain()? Yes probably so, but then this code would have to be restructured a lot more than this small change. I hope that explains how it works here.. unless of course I am missing something??? R REVISION DETAIL https://reviews.freebsd.org/D1777 To: rrs, imp, sbruno, gnn, rwatson, lstewart, kostikbel, adrian, bz, jhb Cc: bz, emaste, hiren, julian, hselasky, freebsd-net