From owner-freebsd-sparc64@FreeBSD.ORG  Tue Mar  1 23:04:30 2005
Return-Path: <owner-freebsd-sparc64@FreeBSD.ORG>
Delivered-To: freebsd-sparc64@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1900616A4CF
	for <sparc64@freebsd.org>; Tue,  1 Mar 2005 23:04:30 +0000 (GMT)
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.192])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D0BAE43D5E
	for <sparc64@freebsd.org>; Tue,  1 Mar 2005 23:04:28 +0000 (GMT)
	(envelope-from bosko.milekic@gmail.com)
Received: by wproxy.gmail.com with SMTP id 70so2439457wra
        for <sparc64@freebsd.org>; Tue, 01 Mar 2005 15:04:27 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
	s=beta; d=gmail.com;
	h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references;
	b=dZeLE31ZQy9p686MrwZ5ROAAUahNPUv/f4QmtSJXUWHOIIaIC1gMoAaWpzGqa484CPCW3SijDH7yQ+Pl4YtPHQDcNa/UpBxAkQS1dcyKEYCtCu78IyBEC1U+XE7qorrte//dal8Zc9K2ek9zNmLncT/Kw9klvao4r351Hc/NB4Y=
Received: by 10.54.18.62 with SMTP id 62mr85204wrr;
        Tue, 01 Mar 2005 15:04:27 -0800 (PST)
Received: by 10.54.24.41 with HTTP; Tue, 1 Mar 2005 15:04:27 -0800 (PST)
Message-ID: <bbebbd3d0503011504560a94b4@mail.gmail.com>
Date: Tue, 1 Mar 2005 18:04:27 -0500
From: Bosko Milekic <bosko.milekic@gmail.com>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <200503011340.18162.jhb@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
References: <20050301000436.GA33346@xor.obsecurity.org>
	 <200503011340.18162.jhb@FreeBSD.org>
cc: Kris Kennaway <kris@obsecurity.org>
cc: net@freebsd.org
cc: rwatson@freebsd.org
cc: bmilekic@freebsd.org
cc: sparc64@freebsd.org
cc: freebsd-sparc64@freebsd.org
Subject: Re: Race condition in mb_free_ext()?
X-BeenThere: freebsd-sparc64@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: Bosko Milekic <bosko.milekic@gmail.com>
List-Id: Porting FreeBSD to the Sparc <freebsd-sparc64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>,
	<mailto:freebsd-sparc64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-sparc64>
List-Post: <mailto:freebsd-sparc64@freebsd.org>
List-Help: <mailto:freebsd-sparc64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>,
	<mailto:freebsd-sparc64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Mar 2005 23:04:30 -0000

On Tue, 1 Mar 2005 13:40:18 -0500, John Baldwin <jhb@freebsd.org> wrote:
> On Monday 28 February 2005 07:04 pm, Kris Kennaway wrote:
> > I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines
> > running RELENG_5.  It's hard to get a good trace because the processes
> > running on other CPUs cannot be traced from DDB, but I've been lucky a
> > few times:
> >
> > db> show alllocks
> > Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008)
> > exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @
> > netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0
> > (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex
> > tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15
> > Tracing pid 15 tid 100008 td 0xfffff8001fb07480
> > sab_intr() at sab_intr+0x40
> > psycho_intr_stub() at psycho_intr_stub+0x8
> > intr_fast() at intr_fast+0x88
> > -- interrupt level=0xd pil=0 %o7=0xc01a0040 --
> > mb_free_ext() at mb_free_ext+0x28
> > sbdrop_locked() at sbdrop_locked+0x19c
> > tcp_input() at tcp_input+0x2aa0
> > ip_input() at ip_input+0x964
> > netisr_processqueue() at netisr_processqueue+0x7c
> > swi_net() at swi_net+0x120
> > ithread_loop() at ithread_loop+0x24c
> > fork_exit() at fork_exit+0xd4
> > fork_trampoline() at fork_trampoline+0x8
> > db>
> >
> > That code is here in mb_free_ext():
> >
> >         /*
> >          * This is tricky.  We need to make sure to decrement the
> >          * refcount in a safe way but to also clean up if we're the
> >          * last reference.  This method seems to do it without race.
> >          */
> >         while (dofree == 0) {
> >                 cnt = *(m->m_ext.ref_cnt);
> >                 if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) {
> >                         if (cnt == 1)
> >                                 dofree = 1;
> >                         break;
> >                 }
> >         }
> 
> Well, this is obtuse at least.  A simpler version would be:
> 
>         do {
>                 cnt = *m->m_ext.ref_cnt;
>         } while (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1) == 0);
>         dofree = (cnt == 1);
> 
> --
> John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

Your suggestion will always enter the loop and do the atomic
regardless of what dofree is set to above that code (not shown in
Kris' paste):

[...]
        /* Account for lazy ref count assign. */
        if (m->m_ext.ref_cnt == NULL)
                dofree = 1;
        else
                dofree = 0;

        /*
         * This is tricky.  We need to make sure to decrement the
         * refcount in a safe way but to also clean up if we're the
         * last reference.  This method seems to do it without race.
         */
[...]

The segment could still be reworked, but anyway:

This does not appear to explain the livelock.  What's m->m_ext.ref_cnt
point to? And what is the value at the location pointed to by
m->m_ext.ref_cnt? Regardless, though, the livelock itself, assuming it
is due to a long time being spent spinning in the above loop, should
not be caused by underruns or overruns of the reference count (those
may only cause leaking of the cluster).

Furthermore, the above code has been around in that form for some time
now and in fact the loop was probably entered *more* often in the past
(before the 'dofree' variable was introduced there).  Since when are
you able to cause the livelock to happen, and are you sure it is the
mb_free_ext() that is looping indefinitely?

I do not know sparc64 well, but what are the semantics of
atomic_cmpset_int()? I see that it is defined to use the 'casa'
instruction; does atomic_cmpset_int() behave the same way as it does
on i386?

-Bosko 

-- 
Bosko Milekic - If I were a number, I'd be irrational.
Contact Info: http://bmilekic.unixdaemons.com/contact.txt