Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Sep 2009 15:52:55 +0100
From:      Andrew Brampton <brampton+freebsd-net@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Is this a race in mbuf's refcounting?
Message-ID:  <d41814900909210752t23309836y4b8a447e811db6d2@mail.gmail.com>
In-Reply-To: <20090921235604.U12163@delplex.bde.org>
References:  <d41814900909210543p46894d83u6d814353ea1ee130@mail.gmail.com> <20090921235604.U12163@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2009/9/21 Bruce Evans <brde@optusnet.com.au>:
> On Mon, 21 Sep 2009, Andrew Brampton wrote:
>
>> I've been reading the FreeBSD source code to understand how mbufs are
>> reference counted. However, there are a few bits of code that I'm
>> wondering if they would fail under the exactly right timing. Take for
>> example in uipc_mbuf.c:
>>
>> 286 static void
>> 287 mb_dupcl(struct mbuf *n, struct mbuf *m)
>> 288 {
>> ...
>> 293 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (*(m->m_ext.ref_cnt) =3D=3D 1)
>> 294 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*(m->m_ext.re=
f_cnt) +=3D 1;
>> 295 =C2=A0 =C2=A0 =C2=A0 =C2=A0else
>> 296 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0atomic_add_in=
t(m->m_ext.ref_cnt, 1);
>> ...
>> 305 }
>>
>> Now, the way I understand this code is, if ref_cnt is 1, then it is
>> not shared. In that case non-atomically increment ref_cnt. However, if
>> ref_cnt was something else, then it is shared so update the value in
>> an atomic way. This seems valid, however what happens if two threads
>> call mb_dupcl at the same time with a non-shared m. Could they both
>> evaluate the if on line 293 at the same time, and then both
>> non-atomically increment ref_cnt?
>>
>> If this could happen then we have a lost update and our reference
>> counting is broken. I've also noticed that in other places similar
>> optimisations are made to avoid the atomic operation.
>>
>> So is this a problem?
>
> I don't see how it can work.
>
> Also, if the count was 1, then it should become 2, but there is nothing t=
o
> flush the store to memory. =C2=A0This seems to mainly enlarge the race wi=
ndow
> for the previous problem.
>
> Bruce
>

Sorry, are you agreeing or disagreeing with my original post? If you
are disagreeing I would appreciate if you could explain the error in
my ways.

I see the following happening:
Thread 1: Reads *(m->m_ext.ref_cnt) and determines it is 1, and enters
the true branch of the if
Thread 1: Then reads *(m->m_ext.ref_cnt) again (since it is volatile)
Thread 2: Interrupts and reads *(m->m_ext.ref_cnt) and determines it
is 1, and enters the true branch of the if
Thread 2: Then reads *(m->m_ext.ref_cnt), adds one to it and stores
the result (ie 2)
Thread 1: Resumes with the value it had (ie 1) and adds one to it, and
stores the result (ie 2)

Due to this sequence we have lost an update, since the value of
*(m->m_ext.ref_cnt) should be 3. Now if this if wasn't there and
atomic_add_int is used the result will be 3.

If you find a flaw in my logic please point it out.

thanks
Andrew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d41814900909210752t23309836y4b8a447e811db6d2>