From owner-freebsd-arch@FreeBSD.ORG Thu Nov 13 03:39:52 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9AD2ABDC for ; Thu, 13 Nov 2014 03:39:52 +0000 (UTC) Received: from mail-wg0-x22d.google.com (mail-wg0-x22d.google.com [IPv6:2a00:1450:400c:c00::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27AF8607 for ; Thu, 13 Nov 2014 03:39:52 +0000 (UTC) Received: by mail-wg0-f45.google.com with SMTP id x12so15816917wgg.32 for ; Wed, 12 Nov 2014 19:39:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=rX2tmEdy7oJT681mwgrId+ouGepdPkmX6LzEjWAV+Qc=; b=z5Ab24ZEZdzi38x3PIugXSSHztZmqkIZ2oauR200U7jHXGifxlDRLeW/Pl1rEnUvkg fS6i/pLmAtzedhy+QozrMUnx+xhErZAzRRQe1CuVRQGY2hvFNZcZAwZJ3X/U6/oduICz dRJGfJAmc9Xix5o16QP9M3dCfw7hC0tXkS4HJXa00qbghkBx+Ku8lqSvv6Z1/I0m3Gnm C1VHPR+7XYUuw2PLAZipzFhZOmaf7XbQQCrBKGB+UTTn0mZyc22MF4zOieG0ZV3OOsUQ tbA79RIrpJU6e/mGQSyQO9uPJB0JMsz8SY/61vNlouf6Uiqg0KK1C53e49vDwhAp/goF TYKQ== MIME-Version: 1.0 X-Received: by 10.180.92.169 with SMTP id cn9mr35711234wib.26.1415849990527; Wed, 12 Nov 2014 19:39:50 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Wed, 12 Nov 2014 19:39:50 -0800 (PST) In-Reply-To: <20141112212613.21037929@kan> References: <20141112212613.21037929@kan> Date: Wed, 12 Nov 2014 19:39:50 -0800 X-Google-Sender-Auth: _0o7ZBoTZgZjNivDyMU2uTfWdM8 Message-ID: Subject: Re: Questions about locking; turnstiles and sleeping threads From: Adrian Chadd To: Alexander Kabaev Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Nov 2014 03:39:52 -0000 On 12 November 2014 18:26, Alexander Kabaev wrote: > On Wed, 12 Nov 2014 18:13:55 -0800 > Adrian Chadd wrote: > >> Hi, >> >> I have a bit of an odd case here. >> >> I'm getting panics in the net80211/ath code, "sleeping thread (X) owns >> non-sleepable lock." >> >> show alllocks just showed one lock held - the net80211 comlock. It's a >> recursive mutex, that's supposed to be sleepable. >> >> The two threads in question look like this: >> >> thread X: net80211_newstate_cb (grabs IEEE80211_LOCK()) >> ath_newstate >> callout_drain - which grabs the ATH_LOCK as part of the callout >> drain side of things >> that enters sleepq_wait() and goes to sleep, waiting for >> whatever's running the callout to >> finish >> >> thread Y: >> rx_path in if_ath_rx_edma >> ath_rx_pkt -> sta_input -> ath_recv_mgmt -> sta_recv_mgmt (grabs >> IEEE80211_LOCK()) -> panics >> >> Thread Y doesn't hold any other locks. It's just trying to grab the >> IEEE80211_LOCK that is being held by thread X. But thread X is asleep >> waiting for whatever callout to finish so it can continue. The code in >> propagate_priority() sees that thread X is sleeping and panics. >> >> So, what's really going on? I don't mind (well, "don't mind") having >> to take another deep dive through all of this to sort it out so it >> doesn't tickle the callout / turnstile code in this particular >> fashion, but I'd first like to ensure that it's not some corner case >> that isn't handled by the check in propagate_priority(). >> >> Thanks, >> >> >> -adrian >> _______________________________________________ > > Hi, > > mutexes are blocking and not sleepable primitives, so doing any > unbounded sleep with mutex locked, such as one you are attempting by > calling callout_drain is illegal. In other words, you are getting an > expected assert and the code in question is wrong. Hi, Right. That isn't mentioned in the manpage. The manpage says: The function callout_drain() is identical to callout_stop() except that it will wait for the callout to be completed if it is already in progress. This function MUST NOT be called while holding any locks on which the callout might block, or deadlock will result. Note that if the callout subsystem has already begun processing this callout, then the callout function may be invoked during the execution of callout_drain(). However, the callout subsystem does guarantee that the callout will be fully stopped before callout_drain() returns. The callout isn't going to block here, but another thread may block. This is good to know. I'll see if I can come up with an addition to the manpage about this. I'm going to have to do another pass over all of the wifi drivers and stack to see where this is happening. Ugh. :( Thanks! -adrian