Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jan 2019 23:09:37 +0100
From:      Andreas Longwitz <longwitz@incore.de>
To:        freebsd-pf@freebsd.org
Cc:        Konstantin Belousov <kib@freebsd.org>,  Gleb Smirnoff <glebius@freebsd.org>, Kristof Provost <kristof@sigsegv.be>
Subject:   Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Message-ID:  <5C4A37A1.80206@incore.de>
In-Reply-To: <20190124203802.GU24863@kib.kiev.ua>
References:  <5BC51424.5000309@incore.de> <C4D1F141-2979-4103-957F-F0314637D978@sigsegv.be> <5BD45882.1000207@incore.de> <D5EEA773-1F0F-4FA0-A39A-486EE323907D@sigsegv.be> <5BEB3B9A.9080402@incore.de> <20181113222533.GJ9744@FreeBSD.org> <5C49ECAA.7060505@incore.de> <20190124203802.GU24863@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
>> I think the problem is the cmpxchg8b instruction used in
>> counter_u64_fetch(), because this machine instruction always writes to
>> memory, also when we only want to read and have (EDX:EAX) = (ECX:EBX):
>>
>>     TEMP64 <- DEST
>>     IF (EDX:EAX = TEMP64)
>>        THEN
>>           ZF <- 1
>>           DEST <- ECX:EBX
>>        ELSE
>>           ZF <- 0
>>           EDX:EAX <- TEMP64
>>           DEST <- TEMP64
>>     FI
>>
>> If one CPU increments the counter in pf_create_state() and another does
>> the fetch, then both CPU's may run the xmpxschg8b at once with the
>> chance that both read the same memory value in TEMP64 and the fetching
>> CPU is the second CPU that writes and so the increment is lossed. Thats
>> what I see without the above patch two or three times a week.
> 
> Please try the following patch.  The idea is to make the value to compare
> with unlikely to be equal to the memory content, for fetch_one().

During my research I first had the same idea, but it did not work. In
the actual coding eax/edx is not well defined before cmpxchg8b is
executed, but it does not help for the problem to do so.

> Also it fixes a silly bug in zero_one().
> 
> diff --git a/sys/i386/include/counter.h b/sys/i386/include/counter.h
> index 7fd26d2a960..aa20831ba18 100644
> --- a/sys/i386/include/counter.h
> +++ b/sys/i386/include/counter.h
> @@ -78,6 +78,9 @@ counter_u64_read_one_8b(uint64_t *p)
>  	uint32_t res_lo, res_high;
>  
>  	__asm __volatile(
> +	"movl	(%0),%%eax\n\t"
> +	"movl	4(%0),%%edx\n\t"
> +	"addl	$0x10000000,%%edx\n\t"	/* avoid write */
>  	"movl	%%eax,%%ebx\n\t"
>  	"movl	%%edx,%%ecx\n\t"
>  	"cmpxchg8b	(%2)"

We can not avoid the write done by cmpxchg8b as can be seen from the
microcode given above, we always end up with "DEST <- TEMP". From the
Intel instruction reference manual:

The destination operand is written back if the comparision fails. (The
processor never produces a locked read without also producing a locked
write).

Maybe it is enough to prefix the cmpxchg8b with LOCK only in function
counter_u64_read_one_8b().


> @@ -120,11 +123,11 @@ counter_u64_zero_one_8b(uint64_t *p)
>  {
>  	__asm __volatile(
> +"\n1:\n\t"
>  	"movl	(%0),%%eax\n\t"
> -	"movl	4(%0),%%edx\n"
> +	"movl	4(%0),%%edx\n\t"
>  	"xorl	%%ebx,%%ebx\n\t"
>  	"xorl	%%ecx,%%ecx\n\t"
> -"1:\n\t"
>  	"cmpxchg8b	(%0)\n\t"
>  	"jnz	1b"
>  	:

If jnz jumps back the instruction cmpxchg8b has load registers eax/edx
with (%0), therefor I do not understand the silly bug.


Andreas



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5C4A37A1.80206>