From owner-freebsd-pf@freebsd.org Wed Nov 14 07:06:07 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9249A110C7D9 for ; Wed, 14 Nov 2018 07:06:07 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9CE116A90F; Wed, 14 Nov 2018 07:06:06 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id wAE75tgQ021167 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 14 Nov 2018 09:05:58 +0200 (EET) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua wAE75tgQ021167 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id wAE75tMx021166; Wed, 14 Nov 2018 09:05:55 +0200 (EET) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Wed, 14 Nov 2018 09:05:55 +0200 From: Konstantin Belousov To: Kristof Provost Cc: Andreas Longwitz , Gleb Smirnoff , freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Message-ID: <20181114070555.GK2378@kib.kiev.ua> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 9CE116A90F X-Spamd-Result: default: False [-105.33 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; ALLOW_DOMAIN_WHITELIST(-100.00)[freebsd.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: mx66.freebsd.org]; NEURAL_HAM_SHORT(-1.00)[-1.000,0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; IP_SCORE(-2.22)[ip: (-2.88), ipnet: 2001:470::/32(-4.54), asn: 6939(-3.59), country: US(-0.10)] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2018 07:06:07 -0000 On Tue, Nov 13, 2018 at 11:17:47PM +0100, Kristof Provost wrote: > On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: > >> > >> Are there any hints why the counter pf_default_rule->states_cur > >> could get a negative value ? > >> > >> I’m afraid I have no idea right now. > >> > > > > OK, in the meantime I did some more research and I am now quite sure > > the > > problem with the bogus pf_default_rule->states_cur counter is not a > > problem in pf. I am convinced it is a problem in counter(9) on i386 > > server. The critical code is the machine instruction cmpxchg8b used in > > /sys/i386/include/counter.h. > > > I’m always happy to hear problems aren’t my fault :) > > >> From intel instruction set reference manual: > > Zhis instruction can be used with a LOCK prefix allow the instruction > > to > > be executed atomically. > > > > We have two other sources in kernel using cmpxchg8b: > > /sys/i386/include/atomic.h and > > /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > > > Both make use of the LOCK feature, in atomic.h a detailed explanation > > is > > given. Because counter.h lacks the LOCK prefix I propose the following > > patch to get around the leak: > > > > --- counter.h.orig 2015-07-03 16:45:36.000000000 +0200 > > +++ counter.h 2018-11-13 16:07:20.329053000 +0100 > > @@ -60,6 +60,7 @@ > > "movl %%edx,%%ecx\n\t" > > "addl (%%edi),%%ebx\n\t" > > "adcl 4(%%edi),%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b %%fs:(%%esi)\n\t" > > "jnz 1b" > > : > > @@ -76,6 +77,7 @@ > > __asm __volatile( > > "movl %%eax,%%ebx\n\t" > > "movl %%edx,%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b (%2)" > > : "=a" (res_lo), "=d"(res_high) > > : "SD" (p) > > @@ -121,6 +123,7 @@ > > "xorl %%ebx,%%ebx\n\t" > > "xorl %%ecx,%%ecx\n\t" > > "1:\n\t" > > + "lock \n\t" > > "cmpxchg8b (%0)\n\t" > > "jnz 1b" > > : > > > That looks very plausible. I’m somewhat out of my depth here, so I’d > like the authors of the counter code to take a look at it. No, it does not look correct. The only atomicity guarantee that is required from the counter.h inc and zero methods are atomicity WRT context switches. The instructions are always executed on the CPU which owns the PCPU element in the counter array, and since the update is executed as single instruction, it does not require more expensive cache line lock AKA LOCK prefix. This is the main feature of the counters on x86. It might read bogus value when fetching the counter but counter.h KPI only guarantee is that the readouts are mostly correct. If you have systematically wrong value always read, there is probably something different going on.