Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jul 2011 09:43:59 +0100
From:      "Robert N. M. Watson" <rwatson@freebsd.org>
To:        Ryan Stone <rysto32@gmail.com>
Cc:        gnn@freebsd.org, bz@freebsd.org, net@freebsd.org
Subject:   Re: m_pkthdr.rcvif dangling pointer problem
Message-ID:  <E05FE767-1923-4D47-9759-FA040E403618@freebsd.org>
In-Reply-To: <CAFMmRNwBbxR-F7PjkwF8E4GjwFQy_0USKW-3u-ZRNxPMJSOQcA@mail.gmail.com>
References:  <20110714154457.GI70776@FreeBSD.org> <CAFMmRNwBbxR-F7PjkwF8E4GjwFQy_0USKW-3u-ZRNxPMJSOQcA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 24 Jul 2011, at 04:51, Ryan Stone wrote:

> I ran headlong into this issue today when trying to test some network
> stack changes.  It's depressingly easy to crash HEAD by periodically
> destroying vlan interfaces while you are sending traffic over those
> interfaces -- I get a panic within minutes.
>=20
>> http://people.freebsd.org/~glebius/patches/ifnet.no_free
>=20
> This patch makes my test system survive longer but does not resolve =
the issue.

Unfortunately, I'm a bit preoccupied currently, so haven't had a chance =
to follow up as yet, but just to follow up on the general issue here: =
this problem existed pre-SMP as well, and could be easily triggered by =
DUMMYNET and removable interfaces as well (as additional queuing delays =
just make the problem worse). In general: we need a solution that =
penalises interface removal, not common-case packet processing. As many =
packets have their source ifnet looked up in common-case processing =
(worth checking this assumption) because it's cheap, any solution that =
causes an interface lookup on every input packet (with synchronisation) =
is also an issue.

Instead, I think we should go for a more radical notion, which is a bit =
harder to implement in our stack: the network stack needs a race-free =
way to "drain" all mbufs referring to a particular ifnet, which does not =
cause existing processing to become more expensive. This is easy in some =
subsystems, but more complex in others -- and the composition of =
subsystems makes it all much harder since we need to know that (to be =
100% correct) packets aren't getting passed between subsystems (and =
hence belong to neither) in a way that races with a sweep through the =
subsystems. It may be possible to get this 99.9% right simply by =
providing a series of callbacks into subsystems that cause queues to be =
walked and drained of packets matching the doomed ifnet. It may also be =
quite cheap to have subsystems that "hold" packets outside of explicit =
queues for some period (i.e., in a thread-local pointer out of the =
stack) add explicit invalidation tests  (i.e., for IFF_DYING) before =
handing off to prevent those packets from traversing into other =
subsystems -- which can be done synchronisation-free, but still wouldn't =
100% prevent the race

Just to give an example: netisr should offer a method for =
netisr_drain_ifnet(struct ifnet *) that causes netisr to walk all of its =
queues to find matching packets and free them. Due to direct dispatch =
and thread-local queues during processing, netisr should also check =
IFF_DYING before handing off.

If we do that, I wonder how robust the system then becomes...? This may =
not be too hard to test. But I'd rather we penalise ifnet removal than, =
say, the IP input path when it needs to check a source interface =
property.

Robert=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E05FE767-1923-4D47-9759-FA040E403618>