From owner-freebsd-pf@freebsd.org Sun Nov 11 21:01:07 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9513711242EB for ; Sun, 11 Nov 2018 21:01:07 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0C2CE6DD2B for ; Sun, 11 Nov 2018 21:01:07 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id C590411242D0; Sun, 11 Nov 2018 21:01:06 +0000 (UTC) Delivered-To: pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3C3E11242CF for ; Sun, 11 Nov 2018 21:01:06 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46D476DCF0 for ; Sun, 11 Nov 2018 21:01:02 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 7F7DCE3AB for ; Sun, 11 Nov 2018 21:01:01 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id wABL1182038865 for ; Sun, 11 Nov 2018 21:01:01 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id wABL11Wx038858 for pf@FreeBSD.org; Sun, 11 Nov 2018 21:01:01 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201811112101.wABL11Wx038858@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: pf@FreeBSD.org Subject: Problem reports for pf@FreeBSD.org that need special attention Date: Sun, 11 Nov 2018 21:01:01 +0000 MIME-Version: 1.0 X-Rspamd-Queue-Id: 0C2CE6DD2B X-Spamd-Result: default: False [-105.89 / 200.00]; FORGED_RECIPIENTS_FORWARDING(0.00)[]; ALLOW_DOMAIN_WHITELIST(-100.00)[FreeBSD.org]; FORWARDED(0.00)[pf@mailman.ysv.freebsd.org]; SPF_FAIL_FORWARDING(0.00)[]; TO_DN_NONE(0.00)[]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all]; URI_COUNT_ODD(1.00)[3]; RCVD_IN_DNSWL_MED(-0.20)[5.0.0.0.0.5.0.0.0.0.0.0.0.0.0.0.a.6.0.2.4.5.2.2.0.0.9.1.1.0.0.2.list.dnswl.org : 127.0.9.2]; MX_GOOD(-0.01)[cached: mx1.FreeBSD.org]; NEURAL_HAM_SHORT(-1.00)[-1.000,0]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; IP_SCORE(-3.68)[ip: (-9.84), ipnet: 2001:1900:2254::/48(-4.76), asn: 10310(-3.69), country: US(-0.09)]; ASN(0.00)[asn:10310, ipnet:2001:1900:2254::/48, country:US]; FORGED_RECIPIENTS(0.00)[pf@FreeBSD.org,freebsd-pf@freebsd.org]; TO_DOM_EQ_FROM_DOM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[FreeBSD.org]; RCPT_COUNT_ONE(0.00)[1]; FROM_NO_DN(0.00)[]; RCVD_COUNT_SEVEN(0.00)[7] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2018 21:01:07 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 203735 | Transparent interception of ipv6 with squid and p 1 problems total for which you should take action. From owner-freebsd-pf@freebsd.org Tue Nov 13 21:01:27 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 086001131B5D for ; Tue, 13 Nov 2018 21:01:27 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F244577AF1 for ; Tue, 13 Nov 2018 21:01:25 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id B415328245; Tue, 13 Nov 2018 22:01:15 +0100 (CET) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id rQ_x5HDhIMrD; Tue, 13 Nov 2018 22:01:14 +0100 (CET) Received: from mail.local.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id D7AF828244; Tue, 13 Nov 2018 22:01:14 +0100 (CET) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.local.incore (Postfix) with ESMTP id B6339112; Tue, 13 Nov 2018 22:01:14 +0100 (CET) Message-ID: <5BEB3B9A.9080402@incore.de> Date: Tue, 13 Nov 2018 22:01:14 +0100 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Kristof Provost CC: freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: F244577AF1 X-Spamd-Result: default: False [-1.36 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.93)[-0.932,0]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-0.87)[-0.868,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[incore.de]; AUTH_NA(1.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[dss.incore.de]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[138.1.145.195.list.dnswl.org : 127.0.10.0]; NEURAL_HAM_SHORT(-0.35)[-0.351,0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:3320, ipnet:195.145.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(-0.09)[asn: 3320(-0.46), country: DE(-0.01)] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 21:01:27 -0000 > > Are there any hints why the counter pf_default_rule->states_cur > could get a negative value ? > > I’m afraid I have no idea right now. > OK, in the meantime I did some more research and I am now quite sure the problem with the bogus pf_default_rule->states_cur counter is not a problem in pf. I am convinced it is a problem in counter(9) on i386 server. The critical code is the machine instruction cmpxchg8b used in /sys/i386/include/counter.h. >From intel instruction set reference manual: Zhis instruction can be used with a LOCK prefix allow the instruction to be executed atomically. We have two other sources in kernel using cmpxchg8b: /sys/i386/include/atomic.h and /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S Both make use of the LOCK feature, in atomic.h a detailed explanation is given. Because counter.h lacks the LOCK prefix I propose the following patch to get around the leak: --- counter.h.orig 2015-07-03 16:45:36.000000000 +0200 +++ counter.h 2018-11-13 16:07:20.329053000 +0100 @@ -60,6 +60,7 @@ "movl %%edx,%%ecx\n\t" "addl (%%edi),%%ebx\n\t" "adcl 4(%%edi),%%ecx\n\t" + "lock \n\t" "cmpxchg8b %%fs:(%%esi)\n\t" "jnz 1b" : @@ -76,6 +77,7 @@ __asm __volatile( "movl %%eax,%%ebx\n\t" "movl %%edx,%%ecx\n\t" + "lock \n\t" "cmpxchg8b (%2)" : "=a" (res_lo), "=d"(res_high) : "SD" (p) @@ -121,6 +123,7 @@ "xorl %%ebx,%%ebx\n\t" "xorl %%ecx,%%ecx\n\t" "1:\n\t" + "lock \n\t" "cmpxchg8b (%0)\n\t" "jnz 1b" : Kindly regards, Andreas From owner-freebsd-pf@freebsd.org Tue Nov 13 22:17:51 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9993C1134530 for ; Tue, 13 Nov 2018 22:17:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0CF277B30A; Tue, 13 Nov 2018 22:17:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 9CEDF252E2; Tue, 13 Nov 2018 22:17:50 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from [192.168.228.1] (94-224-12-202.access.telenet.be [94.224.12.202]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 103A5BDE7; Tue, 13 Nov 2018 23:17:49 +0100 (CET) From: "Kristof Provost" To: "Andreas Longwitz" , "Gleb Smirnoff" , kib@freebsd.org Cc: freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Date: Tue, 13 Nov 2018 23:17:47 +0100 X-Mailer: MailMate (2.0BETAr6127) Message-ID: <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> In-Reply-To: <5BEB3B9A.9080402@incore.de> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0CF277B30A X-Spamd-Result: default: False [-106.83 / 200.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ALLOW_DOMAIN_WHITELIST(-100.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[FreeBSD.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[cached: mx1.FreeBSD.org]; NEURAL_HAM_SHORT(-1.00)[-1.000,0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; IP_SCORE(-3.72)[ip: (-9.75), ipnet: 96.47.64.0/20(-4.83), asn: 11403(-3.91), country: US(-0.09)]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 22:17:51 -0000 On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: >> >> Are there any hints why the counter pf_default_rule->states_cur >> could get a negative value ? >> >> I’m afraid I have no idea right now. >> > > OK, in the meantime I did some more research and I am now quite sure > the > problem with the bogus pf_default_rule->states_cur counter is not a > problem in pf. I am convinced it is a problem in counter(9) on i386 > server. The critical code is the machine instruction cmpxchg8b used in > /sys/i386/include/counter.h. > I’m always happy to hear problems aren’t my fault :) >> From intel instruction set reference manual: > Zhis instruction can be used with a LOCK prefix allow the instruction > to > be executed atomically. > > We have two other sources in kernel using cmpxchg8b: > /sys/i386/include/atomic.h and > /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > Both make use of the LOCK feature, in atomic.h a detailed explanation > is > given. Because counter.h lacks the LOCK prefix I propose the following > patch to get around the leak: > > --- counter.h.orig 2015-07-03 16:45:36.000000000 +0200 > +++ counter.h 2018-11-13 16:07:20.329053000 +0100 > @@ -60,6 +60,7 @@ > "movl %%edx,%%ecx\n\t" > "addl (%%edi),%%ebx\n\t" > "adcl 4(%%edi),%%ecx\n\t" > + "lock \n\t" > "cmpxchg8b %%fs:(%%esi)\n\t" > "jnz 1b" > : > @@ -76,6 +77,7 @@ > __asm __volatile( > "movl %%eax,%%ebx\n\t" > "movl %%edx,%%ecx\n\t" > + "lock \n\t" > "cmpxchg8b (%2)" > : "=a" (res_lo), "=d"(res_high) > : "SD" (p) > @@ -121,6 +123,7 @@ > "xorl %%ebx,%%ebx\n\t" > "xorl %%ecx,%%ecx\n\t" > "1:\n\t" > + "lock \n\t" > "cmpxchg8b (%0)\n\t" > "jnz 1b" > : > That looks very plausible. I’m somewhat out of my depth here, so I’d like the authors of the counter code to take a look at it. Best regards, Kristof From owner-freebsd-pf@freebsd.org Tue Nov 13 22:25:45 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 144F9113485A for ; Tue, 13 Nov 2018 22:25:45 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (glebi.us [198.45.61.253]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B09A7B755 for ; Tue, 13 Nov 2018 22:25:44 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.15.2/8.15.2) with ESMTPS id wADMPanK046049 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 13 Nov 2018 14:25:36 -0800 (PST) (envelope-from glebius@freebsd.org) Received: (from glebius@localhost) by cell.glebi.us (8.15.2/8.15.2/Submit) id wADMPXmb046047; Tue, 13 Nov 2018 14:25:33 -0800 (PST) (envelope-from glebius@freebsd.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@freebsd.org using -f Date: Tue, 13 Nov 2018 14:25:33 -0800 From: Gleb Smirnoff To: Andreas Longwitz Cc: Kristof Provost , freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Message-ID: <20181113222533.GJ9744@FreeBSD.org> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5BEB3B9A.9080402@incore.de> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 6B09A7B755 X-Spamd-Result: default: False [-103.10 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; ALLOW_DOMAIN_WHITELIST(-100.00)[freebsd.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: mx66.freebsd.org]; NEURAL_HAM_SHORT(-0.97)[-0.972,0]; IP_SCORE(-0.02)[country: US(-0.09)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:2906, ipnet:198.45.48.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 22:25:45 -0000 On Tue, Nov 13, 2018 at 10:01:14PM +0100, Andreas Longwitz wrote: A> OK, in the meantime I did some more research and I am now quite sure the A> problem with the bogus pf_default_rule->states_cur counter is not a A> problem in pf. I am convinced it is a problem in counter(9) on i386 A> server. The critical code is the machine instruction cmpxchg8b used in A> /sys/i386/include/counter.h. A> A> From intel instruction set reference manual: A> Zhis instruction can be used with a LOCK prefix allow the instruction to A> be executed atomically. A> A> We have two other sources in kernel using cmpxchg8b: A> /sys/i386/include/atomic.h and A> /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S A single CPU instruction is atomic by definition, with regards to the CPU. A preemption can not happen in a middle of instruction. What the "lock" prefix does is memory locking to avoid unlocked parallel access to the same address by different CPUs. What is special about counter(9) is that %fs:%esi always points to a per-CPU address, because %fs is unique for every CPU and is constant, so no other CPU may write to this address, so lock prefix isn't needed. Of course a true SMP i386 isn't a well tested arch, so I won't assert that counter(9) doesn't have bugs on this arch. However, I don't see lock prefix necessary here. -- Gleb Smirnoff From owner-freebsd-pf@freebsd.org Wed Nov 14 07:06:07 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9249A110C7D9 for ; Wed, 14 Nov 2018 07:06:07 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9CE116A90F; Wed, 14 Nov 2018 07:06:06 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id wAE75tgQ021167 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 14 Nov 2018 09:05:58 +0200 (EET) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua wAE75tgQ021167 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id wAE75tMx021166; Wed, 14 Nov 2018 09:05:55 +0200 (EET) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Wed, 14 Nov 2018 09:05:55 +0200 From: Konstantin Belousov To: Kristof Provost Cc: Andreas Longwitz , Gleb Smirnoff , freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Message-ID: <20181114070555.GK2378@kib.kiev.ua> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 9CE116A90F X-Spamd-Result: default: False [-105.33 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; ALLOW_DOMAIN_WHITELIST(-100.00)[freebsd.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: mx66.freebsd.org]; NEURAL_HAM_SHORT(-1.00)[-1.000,0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; IP_SCORE(-2.22)[ip: (-2.88), ipnet: 2001:470::/32(-4.54), asn: 6939(-3.59), country: US(-0.10)] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2018 07:06:07 -0000 On Tue, Nov 13, 2018 at 11:17:47PM +0100, Kristof Provost wrote: > On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: > >> > >> Are there any hints why the counter pf_default_rule->states_cur > >> could get a negative value ? > >> > >> I’m afraid I have no idea right now. > >> > > > > OK, in the meantime I did some more research and I am now quite sure > > the > > problem with the bogus pf_default_rule->states_cur counter is not a > > problem in pf. I am convinced it is a problem in counter(9) on i386 > > server. The critical code is the machine instruction cmpxchg8b used in > > /sys/i386/include/counter.h. > > > I’m always happy to hear problems aren’t my fault :) > > >> From intel instruction set reference manual: > > Zhis instruction can be used with a LOCK prefix allow the instruction > > to > > be executed atomically. > > > > We have two other sources in kernel using cmpxchg8b: > > /sys/i386/include/atomic.h and > > /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > > > Both make use of the LOCK feature, in atomic.h a detailed explanation > > is > > given. Because counter.h lacks the LOCK prefix I propose the following > > patch to get around the leak: > > > > --- counter.h.orig 2015-07-03 16:45:36.000000000 +0200 > > +++ counter.h 2018-11-13 16:07:20.329053000 +0100 > > @@ -60,6 +60,7 @@ > > "movl %%edx,%%ecx\n\t" > > "addl (%%edi),%%ebx\n\t" > > "adcl 4(%%edi),%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b %%fs:(%%esi)\n\t" > > "jnz 1b" > > : > > @@ -76,6 +77,7 @@ > > __asm __volatile( > > "movl %%eax,%%ebx\n\t" > > "movl %%edx,%%ecx\n\t" > > + "lock \n\t" > > "cmpxchg8b (%2)" > > : "=a" (res_lo), "=d"(res_high) > > : "SD" (p) > > @@ -121,6 +123,7 @@ > > "xorl %%ebx,%%ebx\n\t" > > "xorl %%ecx,%%ecx\n\t" > > "1:\n\t" > > + "lock \n\t" > > "cmpxchg8b (%0)\n\t" > > "jnz 1b" > > : > > > That looks very plausible. I’m somewhat out of my depth here, so I’d > like the authors of the counter code to take a look at it. No, it does not look correct. The only atomicity guarantee that is required from the counter.h inc and zero methods are atomicity WRT context switches. The instructions are always executed on the CPU which owns the PCPU element in the counter array, and since the update is executed as single instruction, it does not require more expensive cache line lock AKA LOCK prefix. This is the main feature of the counters on x86. It might read bogus value when fetching the counter but counter.h KPI only guarantee is that the readouts are mostly correct. If you have systematically wrong value always read, there is probably something different going on.