From owner-freebsd-pf@freebsd.org Thu Jan 24 22:09:41 2019 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4356114C0745 for ; Thu, 24 Jan 2019 22:09:41 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8F21389A66; Thu, 24 Jan 2019 22:09:40 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id B51CD27D1B; Thu, 24 Jan 2019 23:09:38 +0100 (CET) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id bEk9WwUNdRcB; Thu, 24 Jan 2019 23:09:37 +0100 (CET) Received: from mail.local.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id BCEC327D0B; Thu, 24 Jan 2019 23:09:37 +0100 (CET) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.local.incore (Postfix) with ESMTP id 8A6A0196; Thu, 24 Jan 2019 23:09:37 +0100 (CET) Message-ID: <5C4A37A1.80206@incore.de> Date: Thu, 24 Jan 2019 23:09:37 +0100 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: freebsd-pf@freebsd.org CC: Konstantin Belousov , Gleb Smirnoff , Kristof Provost Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> <20181113222533.GJ9744@FreeBSD.org> <5C49ECAA.7060505@incore.de> <20190124203802.GU24863@kib.kiev.ua> In-Reply-To: <20190124203802.GU24863@kib.kiev.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8F21389A66 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-7.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-1.00)[-0.996,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2019 22:09:41 -0000 >> I think the problem is the cmpxchg8b instruction used in >> counter_u64_fetch(), because this machine instruction always writes to >> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX): >> >> TEMP64 <- DEST >> IF (EDX:EAX = TEMP64) >> THEN >> ZF <- 1 >> DEST <- ECX:EBX >> ELSE >> ZF <- 0 >> EDX:EAX <- TEMP64 >> DEST <- TEMP64 >> FI >> >> If one CPU increments the counter in pf_create_state() and another does >> the fetch, then both CPU's may run the xmpxschg8b at once with the >> chance that both read the same memory value in TEMP64 and the fetching >> CPU is the second CPU that writes and so the increment is lossed. Thats >> what I see without the above patch two or three times a week. > > Please try the following patch. The idea is to make the value to compare > with unlikely to be equal to the memory content, for fetch_one(). During my research I first had the same idea, but it did not work. In the actual coding eax/edx is not well defined before cmpxchg8b is executed, but it does not help for the problem to do so. > Also it fixes a silly bug in zero_one(). > > diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h > index 7fd26d2a960..aa20831ba18 100644 > --- a/sys/i386/include/counter.h > +++ b/sys/i386/include/counter.h > @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p) > uint32_t res_lo, res_high; > > __asm __volatile( > + "movl (%0),%%eax\n\t" > + "movl 4(%0),%%edx\n\t" > + "addl $0x10000000,%%edx\n\t" /* avoid write */ > "movl %%eax,%%ebx\n\t" > "movl %%edx,%%ecx\n\t" > "cmpxchg8b (%2)" We can not avoid the write done by cmpxchg8b as can be seen from the microcode given above, we always end up with "DEST <- TEMP". From the Intel instruction reference manual: The destination operand is written back if the comparision fails. (The processor never produces a locked read without also producing a locked write). Maybe it is enough to prefix the cmpxchg8b with LOCK only in function counter_u64_read_one_8b(). > @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p) > { > __asm __volatile( > +"\n1:\n\t" > "movl (%0),%%eax\n\t" > - "movl 4(%0),%%edx\n" > + "movl 4(%0),%%edx\n\t" > "xorl %%ebx,%%ebx\n\t" > "xorl %%ecx,%%ecx\n\t" > -"1:\n\t" > "cmpxchg8b (%0)\n\t" > "jnz 1b" > : If jnz jumps back the instruction cmpxchg8b has load registers eax/edx with (%0), therefor I do not understand the silly bug. Andreas