Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Feb 2005 22:00:25 -0800 (PST)
From:      Doug White <dwhite@gumbysoft.com>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        net@FreeBSD.org
Subject:   Re: Race condition in mb_free_ext()?
Message-ID:  <20050228214850.X62607@carver.gumbysoft.com>
In-Reply-To: <20050301000436.GA33346@xor.obsecurity.org>
References:  <20050301000436.GA33346@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Since we're moving the conversation on this over here....

On Mon, 28 Feb 2005, Kris Kennaway wrote:

> I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines
> running RELENG_5.  It's hard to get a good trace because the processes
> running on other CPUs cannot be traced from DDB, but I've been lucky a
> few times:
>
> db> show alllocks
> Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008)
> exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @ netinet/tcp_input.c:2189
> exclusive sleep mutex inp (tcpinp) r = 0 (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744
> exclusive sleep mutex tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617
> db> wh 15
> Tracing pid 15 tid 100008 td 0xfffff8001fb07480
> sab_intr() at sab_intr+0x40
> psycho_intr_stub() at psycho_intr_stub+0x8
> intr_fast() at intr_fast+0x88
> -- interrupt level=0xd pil=0 %o7=0xc01a0040 --
> mb_free_ext() at mb_free_ext+0x28
> sbdrop_locked() at sbdrop_locked+0x19c
> tcp_input() at tcp_input+0x2aa0
> ip_input() at ip_input+0x964
> netisr_processqueue() at netisr_processqueue+0x7c
> swi_net() at swi_net+0x120
> ithread_loop() at ithread_loop+0x24c
> fork_exit() at fork_exit+0xd4
> fork_trampoline() at fork_trampoline+0x8
> db>
>
> That code is here in mb_free_ext():
>
>         /*
>          * This is tricky.  We need to make sure to decrement the
>          * refcount in a safe way but to also clean up if we're the
>          * last reference.  This method seems to do it without race.
>          */
>         while (dofree == 0) {
>                 cnt = *(m->m_ext.ref_cnt);
>                 if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) {
>                         if (cnt == 1)
>                                 dofree = 1;
>                         break;
>                 }
>         }
>
> mb_free_ext+0x24:       casa            0x4 , %g2, %g1
> mb_free_ext+0x28:       subcc           %g1, %g2, %g0
>
> which is inside the atomic_cmpset_int (i.e. it's probably spinning in
> the loop).
>
> Can anyone see if there's a problem with this code, or perhaps the
> sparc64 implementation of atomic_cmpset_int()?

I considered earlier today changing cnt to be a u_int volitile* to match
how ipi_wait() does a poll on the cpumask that other CPUs are updating.
(alc@ had flagged this earlier on in our debugging.)  The volitile marker
forces some extra load instructions to ensure both the pointer and the
value its pointing to stay fresh. rwatson convinced me to wait and never
got back to me though :-)  He also suggested putting a loop counter in and
perhaps a KASSERT to make sure ref_cnt isn't some absurdly large value
(like 0xdeadc0de).

Forgive me for being naieve, but is there a reason you don't do an atomic
subtraction on the refcount?  I can see why it repeats -- if two things
are warring over the refcount one or the other keep trying until one wins
-- but the subtraction would seem more intuitive.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050228214850.X62607>