Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Dec 2018 15:18:44 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 234135] a race between ifioctl and clone_destroy in lagg and similar network drivers
Message-ID:  <bug-234135-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D234135

            Bug ID: 234135
           Summary: a race between ifioctl and clone_destroy in lagg and
                    similar network drivers
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: avg@FreeBSD.org

We have got a page fault crash with the following stack trace:

_sx_slock_hard
lagg_media_status
ifmedia_ioctl
lagg_ioctl
ifioctl
kern_ioctl
sys_ioctl

It appears that the crash happened because of a destroyed sx lock.

My theory is that the following race happened.
lagg_clone_destroy() and lagg_ioctl() were called concurrently.
lagg_clone_destroy() won a race to lock sc_sx while lagg_media_status() got
blocked on it.  I think that after some adaptive spinning the thread was pl=
aced
on a sleep queue.  Then, lagg_clone_destroy() unlocked the lock and proceed=
ed
to
destroy it.  After the lagg_media_status() thread was waken up it found the
lock
in the destroyed state and crashed on it in a typical fashion (trying to
dereference a NULL pointer as a struct thread pointer).

Here is a more detailed, step by step description of the above.

Thread T1 calls ifioctl with, for instance, SIOCGIFMEDIA parameter.
T1 calls ifunit_ref(), finds the interface with !(if_flags & IFF_DYING).
T1 increments the interface's reference count and proceeds to call ifhwioct=
l()
and then the interface's (driver's) if_ioctl method.

Enter thread T2.
T2 calls ifioctl(SIOCIFDESTROY) on the same interface.
T2 invokes if_clone_destroy() that looks up the interface by name and
increments
its reference count.
Then, T2 calls if_clone_destroyif() that calls ifc_simple_destroy().
The latter calls ifcs_destroy method on the interface.

>From here on we consider driver-specific code that, obviously, varies from
driver to driver.  But after having reviewed a handful of drivers that use
if_clone_simple I see that all of them have the same pattern.

So, T2 calls ifcs_destroy.
A driver's ifcs_destroy would handle its internal state.
Then, it would typically call if_free() on the interface.
Since the interface at this point has multiple outstanding references,
including
one taken by T2 itself, it is not actually freed.  It's just marked as
IFF_DYING. Also, its reference count is decremented by one, so that it can =
be
actually freed after T2 and T1 release their references.

Afterwards, ifcs_destroy would typically free if_softc.

At this point the driver's if_ioctl method is being executed by T1.
The method can access if_softc that has been freed by now.
So, that's the race.

Any internal locking using a lock in the driver's softc instance does not h=
elp,
because the lock would be destroyed and freed together with the if_softc in
ifcs_destroy.
So, if_ioctl attempting to get that lock is the same kind of the problem.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-234135-227>