From owner-svn-src-head@freebsd.org Wed Mar 15 01:26:49 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4476BD0A709; Wed, 15 Mar 2017 01:26:49 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 10BD01878; Wed, 15 Mar 2017 01:26:49 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from [192.168.8.218] (203.141.139.231.static.zoot.jp [203.141.139.231]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 2ECF61E97C; Wed, 15 Mar 2017 02:26:42 +0100 (CET) From: "Kristof Provost" To: "Gleb Smirnoff" Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r315136 - head/sys/netpfil/pf Date: Wed, 15 Mar 2017 10:26:39 +0900 Message-ID: <7B1C8879-E636-4315-99A2-A258AB9AE500@FreeBSD.org> In-Reply-To: <20170314215706.GB1072@FreeBSD.org> References: <201703120542.v2C5gvM4075391@repo.freebsd.org> <20170314215706.GB1072@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailer: MailMate (2.0BETAr6080) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 01:26:49 -0000 On 15 Mar 2017, at 6:57, Gleb Smirnoff wrote: > On Sun, Mar 12, 2017 at 05:42:57AM +0000, Kristof Provost wrote: > K> Log: > K> pf: Fix incorrect rw_sleep() in pf_unload() > K> > K> When we unload we don't hold the pf_rules_lock, so we cannot call > rw_sleep() > K> with it, because it would release a lock we do not hold. There's > no need for the > K> lock either, so we can just tsleep(). > K> > K> While here also make the same change in pf_purge_thread(), > because it explicitly > K> takes the lock before rw_sleep() and then immediately releases it > afterwards. > > The correct change would to be grab lock in pf_unload(), exactly as > pf_purge_thread() > does. With your change you introduces a possible infinite sleep due to > race, since > there is no timeout and no lock. > I must be missing something, because I don’t see the race, and don’t see how we could end up with an infinite sleep. Even if pf_purge_thread() somehow misses that pf_end_threads is non-zero (say it was not sleeping but executing the last vnet cleanup after the pf_end_threads check) while pf_unload() calls the wakeup_one(pf_purge_thread) it would still terminate the next time the tsleep() timed out. The only way around that would be to hold the PF_RULES_LOCK() during all of pf_purge_thread() that’s not actually sleeping. That’s non-trivial because pf_purge_unlinked_rules() also takes the PF_RULES_LOCK and there are lock ordering constraints with some of the other locks taken in the other cleanup code. Given that unloading pf is a non-supported development use case (see MOD_QUIESCE in pf_modevent()), I think I’d rather accept the occasional 10 second delay in unloading. There is another issue with unloading (vnet_pf_uninit() gets called after pf_unload(), which means we try to take a destroyed lock), for which I’ll have a patch soon. Regards, Kristof