From owner-freebsd-hackers@FreeBSD.ORG Fri Dec 16 09:45:59 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 855351065676; Fri, 16 Dec 2011 09:45:59 +0000 (UTC) (envelope-from monthadar@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 41A8A8FC1A; Fri, 16 Dec 2011 09:45:59 +0000 (UTC) Received: by iakl21 with SMTP id l21so7691714iak.13 for ; Fri, 16 Dec 2011 01:45:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=UgxvTbOl9VpgHoleDEKpxNjNR2I3YfLksm++qKABUoM=; b=eKSinIw97YaGSQxpI1iuhUqV4NCTAEg9HpOb6ho9fHVYgns442nHZKePYEiFfYVyJS WwpLufaPbMen/mpd/zobxyklUHbAvnCQCe6pU3eRAlyd+KGXfrJK/j/oe5yIkwzDRtsw /H+1T6mJD7gQsCVupEiaNVZwnO5E9YlUryXIc= MIME-Version: 1.0 Received: by 10.50.153.133 with SMTP id vg5mr7740882igb.80.1324028758726; Fri, 16 Dec 2011 01:45:58 -0800 (PST) Received: by 10.50.51.233 with HTTP; Fri, 16 Dec 2011 01:45:58 -0800 (PST) In-Reply-To: References: <201112130935.33975.jhb@freebsd.org> Date: Fri, 16 Dec 2011 10:45:58 +0100 Message-ID: From: Monthadar Al Jaberi To: Arnaud Lacombe Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: loop inside uma_zfree critical section X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2011 09:45:59 -0000 On Wed, Dec 14, 2011 at 9:10 PM, Arnaud Lacombe wrote: > Hi, > > On Wed, Dec 14, 2011 at 2:47 PM, Monthadar Al Jaberi > wrote: >> On Tue, Dec 13, 2011 at 4:50 PM, Monthadar Al Jaberi >> wrote: >>> On Tue, Dec 13, 2011 at 3:35 PM, John Baldwin wrote: >>>> On Tuesday, December 13, 2011 7:46:34 am Monthadar Al Jaberi wrote: >>>>> Hi, >>>>> >>>>> I am not sure why I am having this problem, but looking >>>>> at the code I dont understand uma_core.c really good. >>>>> So I hope someone can shed a light on this: >>>>> >>>>> I am running on an arm board and and running a kernel module >>>>> that behaves like a wlan interface. so I tx and rx packets. >>>>> >>>>> For now tx is only only sending beacon like frames. >>>>> This is done through using ieee80211_beacon_alloc(). >>>>> >>>>> Then in a callout task to generate periodic beacons: >>>>> >>>>> =A0 =A0 m_dup(avp->beacon, M_DONTWAIT); >>>>> =A0 =A0 mtx_lock(...); >>>>> =A0 =A0 STAILQ_INSERT_TAIL(...); >>>>> =A0 =A0 taskqueue_enqueue(...); >>>>> =A0 =A0 mtx_unlock(...); >>>>> =A0 =A0 callout_schedule(...); >>>>> >>>>> On the RX side, the interrupt handler will read out buffer >>>>> then place it on a queue to be handled by wlan-glue code. >>>>> For now wlan-glue code just frees the mbuf it instead of >>>>> calling net80211 ieee80211_input() functions: >>>>> >>>>> =A0 =A0 m_copyback(...); >>>>> =A0 =A0 /* Allocate new mbuf for next RX. */ >>>>> =A0 =A0 MGETHDR(..., M_DONTWAIT, MT_DATA); >>>>> =A0 =A0 bzero((mtod(sc->Rx_m, void *)), MHLEN); >>>>> =A0 =A0 sc->Rx_m->m_len =3D 0; /* NB: m_gethdr does not set */ >>>>> =A0 =A0 sc->Rx_m->m_data +=3D 20; /* make headroom */ >>>>> >>>>> Then I use a lockmgr inside my kernel module that should >>>>> make sure that we either are on TX or RX path. >>>> >>>> Uh, you can't use a lockmgr lock in interrupt handlers or in >>>> if_transmit/if_start routines. =A0You should most likely just be using= a plain >>>> mutex instead. =A0Also, new code shouldn't use lockmgr in general. =A0= If you >>>> need a sleepable lock, use sx instead. =A0It has a more straightforwar= d API. >>> >>> Ok, I will change the interrupt handler to do something like this: >>> >>> =A0 =A0disaple_interrupt(); >>> =A0 =A0taskqueue_enqueue(...); /* on new rx task queue */ >>> >>> Then on the new rx proc: >>> >>> =A0 =A0sx_slock(...); >>> =A0 =A0m_copyback(...); >>> =A0 =A0enable_interrupt(); >>> =A0 =A0/* Allocate new mbuf for next RX. */ >>> =A0 =A0MGETHDR(..., M_DONTWAIT, MT_DATA); >>> =A0 =A0bzero((mtod(sc->Rx_m, void *)), MHLEN); >>> =A0 =A0sc->Rx_m->m_len =3D 0; /* NB: m_gethdr does not set */ >>> =A0 =A0sc->Rx_m->m_data +=3D 20; /* make headroom */ >>> =A0 =A0sx_sunlock(...); >>> >>> I lock TX/RX paths to make sure my code is threading safe. >>> Also because while programming my deivce (SPI communicatioin) >>> there will be a tsleep() waiting for the DMA interrupt and >>> thus we could be prempted by e.g. a beacon_callout etc... >>> >> >> I did implement your suggestions, using sx and modified interrupt handle= r >> as specified above. But still same problem as before. >> >>>> >>>>> The problem seems to be at [2], somehow after swapping >>>>> buckets in uma_zfree m_dup returns a pointer to >>>>> an mbuf that is still being used by us, [1] and [3] >>>>> have same address. >>>>> Then we call m_freem twice on same mbuf, [4] and [5]. >>>>> And a loop occurs inside uma_free. >>>>> I am using mbufs in a wrong way? Shouldnt mbufs be thread safe? >>>>> Problem seems to occur while swapping buckets. >>>> >>>> Hmm, the uma uses its own locking, so it should be safe, yes. =A0Howev= er, you >>>> are correct about [1] and [3]. =A0The thing is, after [1] the mbuf sho= uldn't >>>> be in any buckets at all (it only gets put back into the bucket during= a >>>> free). =A0Are you sure the mbuf wasn't double free'd previously? >> >> I rechecked and it is almost certain that I dont double free the mbuf >> before [1]. >> And its not like it crashed in the beginning, it does run for a while >> and then it crashes. So our code works for like a hundred beacons sent/r= eceived >> between two arm boards. Its feels like something is preempted, which exp= lains >> why the mbuf is still in the bucket (wrongly)? >> > are you running an INVARIANTS/DIAGNOSTICS/WITNESS/LOCK_DEBUG/... > enabled kernel ? > > are you running on an SMP platform where there might be cache-coherency i= ssue ? Sorry for late answer, I added DIAGNOSTIC/WITNESS didnt see anything strange except for a couple of LORs... If I add INVARIANTS I couldnt login in at all.... it comes to login promt but I cant type anything... I am running one arm cpu, so its not an SMP platform. This is a snippet of my kernel config: makeoptions DEBUG=3D-g #Build kernel with gdb(1) debug symbols options DDB options KDB options SCHED_4BSD #4BSD scheduler options INET #InterNETworking options INET6 #IPv6 communications protocols options FFS #Berkeley Fast Filesystem options DIAGNOSTIC options UFS_ACL #Support for access control lists options UFS_DIRHASH #Improve performance on big directories options KTRACE #ktrace(1) support options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING #Posix P1003_1B real-time extensions options MUTEX_NOINLINE options RWLOCK_NOINLINE options SX_NOINLINE options NO_FFS_SNAPSHOT options NO_SWAPPING options DEADLKRES options INVARIANTS options INVARIANT_SUPPORT options WITNESS > > Thanks, > =A0- Arnaud Thanks, --=20 Monthadar Al Jaberi