From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 14 20:10:51 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 324381065670 for ; Wed, 14 Dec 2011 20:10:51 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 94F148FC0C for ; Wed, 14 Dec 2011 20:10:50 +0000 (UTC) Received: by eekc50 with SMTP id c50so1490047eek.13 for ; Wed, 14 Dec 2011 12:10:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=6QHo8gTKUS+LkYQOTgueyCMfc4pi1G/zk23hBB8s4Go=; b=fjBY2cUa6PJpZy3JP+6OIsj5U2t5WvLkK9OL0irvVO2tujRmEJfFgOcjM877O6Vjda /FVdlSeksbGxhnWuPz3CLbLLq9OePyF+GARJ2AO7S8Ru1HGxZ0DiNxTtIRb+m0qUEaj8 07OBprWbEKZxoyBjJNuqJLeWNua3ScDVQOJts= MIME-Version: 1.0 Received: by 10.180.18.233 with SMTP id z9mr416357wid.0.1323893449567; Wed, 14 Dec 2011 12:10:49 -0800 (PST) Received: by 10.180.99.226 with HTTP; Wed, 14 Dec 2011 12:10:49 -0800 (PST) In-Reply-To: References: <201112130935.33975.jhb@freebsd.org> Date: Wed, 14 Dec 2011 15:10:49 -0500 Message-ID: From: Arnaud Lacombe To: Monthadar Al Jaberi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: loop inside uma_zfree critical section X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Dec 2011 20:10:51 -0000 Hi, On Wed, Dec 14, 2011 at 2:47 PM, Monthadar Al Jaberi wrote: > On Tue, Dec 13, 2011 at 4:50 PM, Monthadar Al Jaberi > wrote: >> On Tue, Dec 13, 2011 at 3:35 PM, John Baldwin wrote: >>> On Tuesday, December 13, 2011 7:46:34 am Monthadar Al Jaberi wrote: >>>> Hi, >>>> >>>> I am not sure why I am having this problem, but looking >>>> at the code I dont understand uma_core.c really good. >>>> So I hope someone can shed a light on this: >>>> >>>> I am running on an arm board and and running a kernel module >>>> that behaves like a wlan interface. so I tx and rx packets. >>>> >>>> For now tx is only only sending beacon like frames. >>>> This is done through using ieee80211_beacon_alloc(). >>>> >>>> Then in a callout task to generate periodic beacons: >>>> >>>> =A0 =A0 m_dup(avp->beacon, M_DONTWAIT); >>>> =A0 =A0 mtx_lock(...); >>>> =A0 =A0 STAILQ_INSERT_TAIL(...); >>>> =A0 =A0 taskqueue_enqueue(...); >>>> =A0 =A0 mtx_unlock(...); >>>> =A0 =A0 callout_schedule(...); >>>> >>>> On the RX side, the interrupt handler will read out buffer >>>> then place it on a queue to be handled by wlan-glue code. >>>> For now wlan-glue code just frees the mbuf it instead of >>>> calling net80211 ieee80211_input() functions: >>>> >>>> =A0 =A0 m_copyback(...); >>>> =A0 =A0 /* Allocate new mbuf for next RX. */ >>>> =A0 =A0 MGETHDR(..., M_DONTWAIT, MT_DATA); >>>> =A0 =A0 bzero((mtod(sc->Rx_m, void *)), MHLEN); >>>> =A0 =A0 sc->Rx_m->m_len =3D 0; /* NB: m_gethdr does not set */ >>>> =A0 =A0 sc->Rx_m->m_data +=3D 20; /* make headroom */ >>>> >>>> Then I use a lockmgr inside my kernel module that should >>>> make sure that we either are on TX or RX path. >>> >>> Uh, you can't use a lockmgr lock in interrupt handlers or in >>> if_transmit/if_start routines. =A0You should most likely just be using = a plain >>> mutex instead. =A0Also, new code shouldn't use lockmgr in general. =A0I= f you >>> need a sleepable lock, use sx instead. =A0It has a more straightforward= API. >> >> Ok, I will change the interrupt handler to do something like this: >> >> =A0 =A0disaple_interrupt(); >> =A0 =A0taskqueue_enqueue(...); /* on new rx task queue */ >> >> Then on the new rx proc: >> >> =A0 =A0sx_slock(...); >> =A0 =A0m_copyback(...); >> =A0 =A0enable_interrupt(); >> =A0 =A0/* Allocate new mbuf for next RX. */ >> =A0 =A0MGETHDR(..., M_DONTWAIT, MT_DATA); >> =A0 =A0bzero((mtod(sc->Rx_m, void *)), MHLEN); >> =A0 =A0sc->Rx_m->m_len =3D 0; /* NB: m_gethdr does not set */ >> =A0 =A0sc->Rx_m->m_data +=3D 20; /* make headroom */ >> =A0 =A0sx_sunlock(...); >> >> I lock TX/RX paths to make sure my code is threading safe. >> Also because while programming my deivce (SPI communicatioin) >> there will be a tsleep() waiting for the DMA interrupt and >> thus we could be prempted by e.g. a beacon_callout etc... >> > > I did implement your suggestions, using sx and modified interrupt handler > as specified above. But still same problem as before. > >>> >>>> The problem seems to be at [2], somehow after swapping >>>> buckets in uma_zfree m_dup returns a pointer to >>>> an mbuf that is still being used by us, [1] and [3] >>>> have same address. >>>> Then we call m_freem twice on same mbuf, [4] and [5]. >>>> And a loop occurs inside uma_free. >>>> I am using mbufs in a wrong way? Shouldnt mbufs be thread safe? >>>> Problem seems to occur while swapping buckets. >>> >>> Hmm, the uma uses its own locking, so it should be safe, yes. =A0Howeve= r, you >>> are correct about [1] and [3]. =A0The thing is, after [1] the mbuf shou= ldn't >>> be in any buckets at all (it only gets put back into the bucket during = a >>> free). =A0Are you sure the mbuf wasn't double free'd previously? > > I rechecked and it is almost certain that I dont double free the mbuf > before [1]. > And its not like it crashed in the beginning, it does run for a while > and then it crashes. So our code works for like a hundred beacons sent/re= ceived > between two arm boards. Its feels like something is preempted, which expl= ains > why the mbuf is still in the bucket (wrongly)? > are you running an INVARIANTS/DIAGNOSTICS/WITNESS/LOCK_DEBUG/... enabled kernel ? are you running on an SMP platform where there might be cache-coherency iss= ue ? Thanks, - Arnaud