From owner-freebsd-pf@freebsd.org Sun Nov 18 13:32:55 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1B501102601 for ; Sun, 18 Nov 2018 13:32:55 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B6B3B7DFF9; Sun, 18 Nov 2018 13:32:54 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 628E927789; Sun, 18 Nov 2018 14:32:46 +0100 (CET) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id T587w6EtGT6N; Sun, 18 Nov 2018 14:32:45 +0100 (CET) Received: from mail.local.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 6169B2774B; Sun, 18 Nov 2018 14:32:45 +0100 (CET) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.local.incore (Postfix) with ESMTP id 31F3C177; Sun, 18 Nov 2018 14:32:45 +0100 (CET) Message-ID: <5BF169FC.2080508@incore.de> Date: Sun, 18 Nov 2018 14:32:44 +0100 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Konstantin Belousov CC: Kristof Provost , Gleb Smirnoff , freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> <20181114070555.GK2378@kib.kiev.ua> In-Reply-To: <20181114070555.GK2378@kib.kiev.ua> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B6B3B7DFF9 X-Spamd-Result: default: False [1.13 / 15.00]; ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.33)[0.333,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[incore.de]; AUTH_NA(1.00)[]; TO_DN_SOME(0.00)[]; IP_SCORE(-0.10)[asn: 3320(-0.46), country: DE(-0.01)]; MX_GOOD(-0.01)[dss.incore.de]; RCVD_IN_DNSWL_NONE(0.00)[138.1.145.195.list.dnswl.org : 127.0.10.0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:3320, ipnet:195.145.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2018 13:32:55 -0000 Thank you all for explanation how counter(9) works in detail. > A single CPU instruction is atomic by definition, with regards to the CPU. > A preemption can not happen in a middle of instruction. What the "lock" > prefix does is memory locking to avoid unlocked parallel access to the > same address by different CPUs. OK, my view of "atomic" in this context was wrong. > No, it does not look correct. The only atomicity guarantee that is required > from the counter.h inc and zero methods are atomicity WRT context switches. > The instructions are always executed on the CPU which owns the PCPU element > in the counter array, and since the update is executed as single instruction, > it does not require more expensive cache line lock AKA LOCK prefix. This > is the main feature of the counters on x86. > > It might read bogus value when fetching the counter but counter.h KPI only > guarantee is that the readouts are mostly correct. If you have systematically > wrong value always read, there is probably something different going on. On one of my two failing servers I have eliminated all "rdr pass" rules, so counter(9) is not used at the moment for pf_default_rule.states_cur. Using DTrace I can see the negative value -49 for this counter: CPU ID FUNCTION:NAME 3 1 :BEGIN feature=bfebfbff, ncpus=4 pf_default_rule.states_cur=0xc82cb3c8 0xc82cb3c8: counter0=0x00000000007bd25b 0xc82cb7c8: counter1=0xffffffffffd32262 0xc82cbbc8: counter2=0xffffffffffd87de1 0xc82cbfc8: counter3=0xffffffffffd88d31 counter =0xffffffffffffffcf On my other concerned server I have introduces a panic call as soon as the counter value returned by counter_u64_fetch() in pf_state_expires() will become negative. So I will wait for the panic and hope for more information from the kerneldump. There is one unusual configuration on the two servers: they use pf and ipfw/ipdivert at once. The reason for this is my use of natd for incoming ftp requests to my ftp server, pf handles all the other traffic. This configuration is a little bit tricky but works correct for many years. One exception was recently a problem with a buggy remote ftp client I had to debug. During this period I had to restart/reload ipfw and natd a couple of times. Because pf also has a reference to ipdivert, perhaps there is a hidden interaction with the expire problem of pf. Annotation: the buggy ftp client revealed a problem in natd (PR 230755). Regards, Andreas