Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Mar 2005 18:10:55 -0500
From:      Bosko Milekic <bosko.milekic@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: Race condition in mb_free_ext()?
Message-ID:  <bbebbd3d05030115104ac8f86e@mail.gmail.com>
In-Reply-To: <bbebbd3d0503011504560a94b4@mail.gmail.com>
References:  <20050301000436.GA33346@xor.obsecurity.org> <200503011340.18162.jhb@FreeBSD.org> <bbebbd3d0503011504560a94b4@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
P.S.: Not sure if this is related:
http://www.mail-archive.com/bk-commits-24@vger.kernel.org/msg00136.html


On Tue, 1 Mar 2005 18:04:27 -0500, Bosko Milekic
<bosko.milekic@gmail.com> wrote:
> On Tue, 1 Mar 2005 13:40:18 -0500, John Baldwin <jhb@freebsd.org> wrote:
> > On Monday 28 February 2005 07:04 pm, Kris Kennaway wrote:
> > > I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines
> > > running RELENG_5.  It's hard to get a good trace because the processes
> > > running on other CPUs cannot be traced from DDB, but I've been lucky a
> > > few times:
> > >
> > > db> show alllocks
> > > Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008)
> > > exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @
> > > netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0
> > > (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex
> > > tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15
> > > Tracing pid 15 tid 100008 td 0xfffff8001fb07480
> > > sab_intr() at sab_intr+0x40
> > > psycho_intr_stub() at psycho_intr_stub+0x8
> > > intr_fast() at intr_fast+0x88
> > > -- interrupt level=0xd pil=0 %o7=0xc01a0040 --
> > > mb_free_ext() at mb_free_ext+0x28
> > > sbdrop_locked() at sbdrop_locked+0x19c
> > > tcp_input() at tcp_input+0x2aa0
> > > ip_input() at ip_input+0x964
> > > netisr_processqueue() at netisr_processqueue+0x7c
> > > swi_net() at swi_net+0x120
> > > ithread_loop() at ithread_loop+0x24c
> > > fork_exit() at fork_exit+0xd4
> > > fork_trampoline() at fork_trampoline+0x8
> > > db>
> > >
> > > That code is here in mb_free_ext():
> > >
> > >         /*
> > >          * This is tricky.  We need to make sure to decrement the
> > >          * refcount in a safe way but to also clean up if we're the
> > >          * last reference.  This method seems to do it without race.
> > >          */
> > >         while (dofree == 0) {
> > >                 cnt = *(m->m_ext.ref_cnt);
> > >                 if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) {
> > >                         if (cnt == 1)
> > >                                 dofree = 1;
> > >                         break;
> > >                 }
> > >         }
> >
> > Well, this is obtuse at least.  A simpler version would be:
> >
> >         do {
> >                 cnt = *m->m_ext.ref_cnt;
> >         } while (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1) == 0);
> >         dofree = (cnt == 1);
> >
> > --
> > John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> > "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
> 
> Your suggestion will always enter the loop and do the atomic
> regardless of what dofree is set to above that code (not shown in
> Kris' paste):
> 
> [...]
>         /* Account for lazy ref count assign. */
>         if (m->m_ext.ref_cnt == NULL)
>                 dofree = 1;
>         else
>                 dofree = 0;
> 
>         /*
>          * This is tricky.  We need to make sure to decrement the
>          * refcount in a safe way but to also clean up if we're the
>          * last reference.  This method seems to do it without race.
>          */
> [...]
> 
> The segment could still be reworked, but anyway:
> 
> This does not appear to explain the livelock.  What's m->m_ext.ref_cnt
> point to? And what is the value at the location pointed to by
> m->m_ext.ref_cnt? Regardless, though, the livelock itself, assuming it
> is due to a long time being spent spinning in the above loop, should
> not be caused by underruns or overruns of the reference count (those
> may only cause leaking of the cluster).
> 
> Furthermore, the above code has been around in that form for some time
> now and in fact the loop was probably entered *more* often in the past
> (before the 'dofree' variable was introduced there).  Since when are
> you able to cause the livelock to happen, and are you sure it is the
> mb_free_ext() that is looping indefinitely?
> 
> I do not know sparc64 well, but what are the semantics of
> atomic_cmpset_int()? I see that it is defined to use the 'casa'
> instruction; does atomic_cmpset_int() behave the same way as it does
> on i386?
> 
> -Bosko
> 
> --
> Bosko Milekic - If I were a number, I'd be irrational.
> Contact Info: http://bmilekic.unixdaemons.com/contact.txt
> 


-- 
Bosko Milekic - If I were a number, I'd be irrational.
Contact Info: http://bmilekic.unixdaemons.com/contact.txt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bbebbd3d05030115104ac8f86e>