FreeBSD Mail Archives

Date:      Tue, 1 Mar 2005 18:04:27 -0500
From:      Bosko Milekic <bosko.milekic@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: Race condition in mb_free_ext()?
Message-ID:  <bbebbd3d0503011504560a94b4@mail.gmail.com>
In-Reply-To: <200503011340.18162.jhb@FreeBSD.org>
References:  <20050301000436.GA33346@xor.obsecurity.org> <200503011340.18162.jhb@FreeBSD.org>

On Tue, 1 Mar 2005 13:40:18 -0500, John Baldwin <jhb@freebsd.org> wrote:
> On Monday 28 February 2005 07:04 pm, Kris Kennaway wrote:
> > I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines
> > running RELENG_5.  It's hard to get a good trace because the processes
> > running on other CPUs cannot be traced from DDB, but I've been lucky a
> > few times:
> >
> > db> show alllocks
> > Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008)
> > exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @
> > netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0
> > (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex
> > tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15
> > Tracing pid 15 tid 100008 td 0xfffff8001fb07480
> > sab_intr() at sab_intr+0x40
> > psycho_intr_stub() at psycho_intr_stub+0x8
> > intr_fast() at intr_fast+0x88
> > -- interrupt level=0xd pil=0 %o7=0xc01a0040 --
> > mb_free_ext() at mb_free_ext+0x28
> > sbdrop_locked() at sbdrop_locked+0x19c
> > tcp_input() at tcp_input+0x2aa0
> > ip_input() at ip_input+0x964
> > netisr_processqueue() at netisr_processqueue+0x7c
> > swi_net() at swi_net+0x120
> > ithread_loop() at ithread_loop+0x24c
> > fork_exit() at fork_exit+0xd4
> > fork_trampoline() at fork_trampoline+0x8
> > db>
> >
> > That code is here in mb_free_ext():
> >
> >         /*
> >          * This is tricky.  We need to make sure to decrement the
> >          * refcount in a safe way but to also clean up if we're the
> >          * last reference.  This method seems to do it without race.
> >          */
> >         while (dofree == 0) {
> >                 cnt = *(m->m_ext.ref_cnt);
> >                 if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) {
> >                         if (cnt == 1)
> >                                 dofree = 1;
> >                         break;
> >                 }
> >         }
> 
> Well, this is obtuse at least.  A simpler version would be:
> 
>         do {
>                 cnt = *m->m_ext.ref_cnt;
>         } while (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1) == 0);
>         dofree = (cnt == 1);
> 
> --
> John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

Your suggestion will always enter the loop and do the atomic
regardless of what dofree is set to above that code (not shown in
Kris' paste):

[...]
        /* Account for lazy ref count assign. */
        if (m->m_ext.ref_cnt == NULL)
                dofree = 1;
        else
                dofree = 0;

        /*
         * This is tricky.  We need to make sure to decrement the
         * refcount in a safe way but to also clean up if we're the
         * last reference.  This method seems to do it without race.
         */
[...]

The segment could still be reworked, but anyway:

This does not appear to explain the livelock.  What's m->m_ext.ref_cnt
point to? And what is the value at the location pointed to by
m->m_ext.ref_cnt? Regardless, though, the livelock itself, assuming it
is due to a long time being spent spinning in the above loop, should
not be caused by underruns or overruns of the reference count (those
may only cause leaking of the cluster).

Furthermore, the above code has been around in that form for some time
now and in fact the loop was probably entered *more* often in the past
(before the 'dofree' variable was introduced there).  Since when are
you able to cause the livelock to happen, and are you sure it is the
mb_free_ext() that is looping indefinitely?

I do not know sparc64 well, but what are the semantics of
atomic_cmpset_int()? I see that it is defined to use the 'casa'
instruction; does atomic_cmpset_int() behave the same way as it does
on i386?

-Bosko 

-- 
Bosko Milekic - If I were a number, I'd be irrational.
Contact Info: http://bmilekic.unixdaemons.com/contact.txt

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bbebbd3d0503011504560a94b4>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation