Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Mar 2017 10:26:39 +0900
From:      "Kristof Provost" <kp@FreeBSD.org>
To:        "Gleb Smirnoff" <glebius@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r315136 - head/sys/netpfil/pf
Message-ID:  <7B1C8879-E636-4315-99A2-A258AB9AE500@FreeBSD.org>
In-Reply-To: <20170314215706.GB1072@FreeBSD.org>
References:  <201703120542.v2C5gvM4075391@repo.freebsd.org> <20170314215706.GB1072@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 15 Mar 2017, at 6:57, Gleb Smirnoff wrote:
> On Sun, Mar 12, 2017 at 05:42:57AM +0000, Kristof Provost wrote:
> K> Log:
> K>   pf: Fix incorrect rw_sleep() in pf_unload()
> K>
> K>   When we unload we don't hold the pf_rules_lock, so we cannot call 
> rw_sleep()
> K>   with it, because it would release a lock we do not hold. There's 
> no need for the
> K>   lock either, so we can just tsleep().
> K>
> K>   While here also make the same change in pf_purge_thread(), 
> because it explicitly
> K>   takes the lock before rw_sleep() and then immediately releases it 
> afterwards.
>
> The correct change would to be grab lock in pf_unload(), exactly as 
> pf_purge_thread()
> does. With your change you introduces a possible infinite sleep due to 
> race, since
> there is no timeout and no lock.
>
I must be missing something, because I don’t see the race, and don’t 
see how we
could end up with an infinite sleep.

Even if pf_purge_thread() somehow misses that pf_end_threads is non-zero 
(say
it was not sleeping but executing the last vnet cleanup after the
pf_end_threads check) while pf_unload() calls the 
wakeup_one(pf_purge_thread)
it would still terminate the next time the tsleep() timed out.

The only way around that would be to hold the PF_RULES_LOCK() during all 
of
pf_purge_thread() that’s not actually sleeping. That’s non-trivial 
because
pf_purge_unlinked_rules() also takes the PF_RULES_LOCK and there are 
lock
ordering constraints with some of the other locks taken in the other 
cleanup
code.

Given that unloading pf is a non-supported development use case (see
MOD_QUIESCE in pf_modevent()), I think I’d rather accept the 
occasional 10
second delay in unloading.

There is another issue with unloading (vnet_pf_uninit() gets called 
after
pf_unload(), which means we try to take a destroyed lock), for which 
I’ll have
a patch soon.

Regards,
Kristof



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7B1C8879-E636-4315-99A2-A258AB9AE500>