Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Oct 2018 10:44:31 -0700
From:      "Kristof Provost" <kristof@sigsegv.be>
To:        "Andreas Longwitz" <longwitz@incore.de>
Cc:        freebsd-pf@freebsd.org
Subject:   Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections
Message-ID:  <D5EEA773-1F0F-4FA0-A39A-486EE323907D@sigsegv.be>
In-Reply-To: <5BD45882.1000207@incore.de>
References:  <5BC51424.5000309@incore.de> <C4D1F141-2979-4103-957F-F0314637D978@sigsegv.be> <5BD45882.1000207@incore.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 27 Oct 2018, at 5:22, Andreas Longwitz wrote:
> Thanks very much for answer especially for the hint to openbsd.
>
>> I wonder if there’s an integer overflow in the of_state_expires()
>> calculation.
>> The OpenBSD people have a cast to u_int64_t in their version:
>>
>> |timeout = (u_int64_t)timeout * (end - states) / (end - start);
>> |
>>
>> Perhaps this would fix your problem? (Untested, not even compiled)
>>
>> |        if (end && states > start && start < end) {
>>                 if (states < end) {
>>                     timeout = (uint64_t)timeout * (end - states) / 
>> (end - start);
>>                         return (state->expire + timeout;
>>                 }
>>                 else
>>                         return (time_uptime);
>>         }
>>         return (state->expire + timeout);
>
> I can confirm the patch of the openbsd people adding the uint64_t cast
> makes sense. If
>         timeout * (end - states)
> becomes bigger than UINT32_MAX (I am on i386) the cast prevents the
> overflow of this product and the result of the adaptive calculation 
> will
> always be correct.
>
> Example: start=6000, end=12000, timeout=86400 * 5 (5 days), states=100
>          result 140972, result with cast patch 856800.
>
> In the problem I have reported for states of "rdr pass" rules I see
> start=6000, end=12000, timeout=86400 and (obviously erroneous, 
> probably
> negative) states=0xffffffd0.
>
I have no idea how that can happen. Just to make sure I understand: you 
know that states is negative here because of a printf() or SDT addition 
in pf_expire_states(), right?

> Further the counter variable for states_cur of pf_default_rule is
> used für all "rdr/nat/binat pass" rules together. This was a little 
> bit
> suprising for me, but I think this is intended behaviour. Correct ?
>
Yes.

> Are there any hints why the counter pf_default_rule->states_cur
> could get a negative value ?
>
I’m afraid I have no idea right now.

Best regards,
Kristof
From owner-freebsd-pf@freebsd.org  Sat Oct 27 21:48:38 2018
Return-Path: <owner-freebsd-pf@freebsd.org>
Delivered-To: freebsd-pf@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A71210D5129
 for <freebsd-pf@mailman.ysv.freebsd.org>; Sat, 27 Oct 2018 21:48:38 +0000 (UTC)
 (envelope-from longwitz@incore.de)
Received: from dss.incore.de (dss.incore.de [195.145.1.138])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 921A77D0C7
 for <freebsd-pf@freebsd.org>; Sat, 27 Oct 2018 21:48:37 +0000 (UTC)
 (envelope-from longwitz@incore.de)
Received: from inetmail.dmz (inetmail.dmz [10.3.0.3])
 by dss.incore.de (Postfix) with ESMTP id 46B1D139AC;
 Sat, 27 Oct 2018 23:48:35 +0200 (CEST)
X-Virus-Scanned: amavisd-new at incore.de
Received: from dss.incore.de ([10.3.0.3])
 by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024)
 with LMTP id hkzOknWXRACS; Sat, 27 Oct 2018 23:48:34 +0200 (CEST)
Received: from mail.local.incore (fwintern.dmz [10.0.0.253])
 by dss.incore.de (Postfix) with ESMTP id 15DF7139A5;
 Sat, 27 Oct 2018 23:48:34 +0200 (CEST)
Received: from bsdmhs.longwitz (unknown [192.168.99.6])
 by mail.local.incore (Postfix) with ESMTP id E8C40117;
 Sat, 27 Oct 2018 23:48:33 +0200 (CEST)
Message-ID: <5BD4DD31.409@incore.de>
Date: Sat, 27 Oct 2018 23:48:33 +0200
From: Andreas Longwitz <longwitz@incore.de>
User-Agent: Thunderbird 2.0.0.19 (X11/20090113)
MIME-Version: 1.0
To: Kristof Provost <kristof@sigsegv.be>
CC: freebsd-pf@freebsd.org
Subject: Re: rdr pass for proto tcp sometimes creates states with expire time
 zero and so breaking connections
References: <5BC51424.5000309@incore.de>
 <C4D1F141-2979-4103-957F-F0314637D978@sigsegv.be>
 <5BD45882.1000207@incore.de>
 <D5EEA773-1F0F-4FA0-A39A-486EE323907D@sigsegv.be>
In-Reply-To: <D5EEA773-1F0F-4FA0-A39A-486EE323907D@sigsegv.be>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-pf@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Technical discussion and general questions about packet filter
 \(pf\)" <freebsd-pf.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-pf>,
 <mailto:freebsd-pf-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-pf/>;
List-Post: <mailto:freebsd-pf@freebsd.org>
List-Help: <mailto:freebsd-pf-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
 <mailto:freebsd-pf-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Oct 2018 21:48:38 -0000

>     In the problem I have reported for states of "rdr pass" rules I see
>     start=6000, end=12000, timeout=86400 and (obviously erroneous, probably
>     negative) states=0xffffffd0.
> 
> I have no idea how that can happen. Just to make sure I understand: you
> know that states is negative here because of a printf() or SDT addition
> in pf_expire_states(), right?

I did not change the kernel, I use DTrace on my firewall server
fwextern. In pf.conf I have changed all productive "rdr pass" rules to a
rdr rule and an extra filter rule. Now only one "rdr pass" rule is left
for test:

  rdr pass on $if_internet proto tcp from 31.17.172.227 to $ip_internet
      port 8022 -> 10.0.0.254

Now I start the following DTrace script pfcounter.d, which will be
active when a SYN on port 8022 arrives:

#!/usr/sbin/dtrace -s

fbt::pf_normalize_tcp:entry
/((*(args[2]->m_hdr.mh_data + 33)) & 0x02) == 0x02 && htons(*(short
*)(args[2]->m_hdr.mh_data + 22)) == 8022/
       /* SYN + port 8022 */
{ self->flag1 = 1; }

fbt::pf_test:return
/self->flag1/ { self->flag1 = 0; }

fbt::pfioctl:entry
/args[1] == 3221767193 && ((struct pfioc_states *)args[2])->ps_len != 0/
       /* DIOCGETSTATES  &&  len != 0 */
{ self->flag2 = 1; }

fbt::counter_u64_fetch:entry
/self->flag2/ { }
fbt::counter_u64_fetch:return
/self->flag2/ { printf("        returncode (states_cur)=%d / 0x%x",
args[1], args[1]); }

fbt::pfioctl:return
/self->flag2/
{ self->flag2 = 0; }


Now I run on my remote test client (IP 31.17.172.227) the command

   ssh -p 8022 fwextern sleep 20

This creates on fwextern a state for the "rdr pass" rule with expire
time zero. I must be quick to run "pfctl -vss" on fwextern to see this
state and the output of the DTrace script shows me the "negative" value
of the counter:

=== root@fwextern (pts/0) -> ./pfcounter.d
dtrace: script './pfcounter.d' matched 6 probes
CPU     ID                    FUNCTION:NAME
  3  17624          counter_u64_fetch:entry
  3  17625         counter_u64_fetch:return         returncode
(states_cur)=4294967248 / 0xffffffd0

If I run on the test client the ssh command twice, then the counter is
one less negative than before:

=== root@fwextern (pts/0) -> ./pfcounter.d
dtrace: script './pfcounter.d' matched 6 probes
CPU     ID                    FUNCTION:NAME
  3  17624          counter_u64_fetch:entry
  3  17625         counter_u64_fetch:return         returncode
(states_cur)=4294967249 / 0xffffffd1
  3  17624          counter_u64_fetch:entry
  3  17625         counter_u64_fetch:return         returncode
(states_cur)=4294967249 / 0xffffffd1

Because of "sleep 20" the ssh command does not return and must be
killed. I have observed the problem on two of my firewall servers, the
pf rules never were reloaded since boot. I think there must be an
unknown event in the past, that triggered the negative counter value.

I will try to add a statement to the kernel that recognizes the problem
and go back to the "rdr pass" rules, so next time the problem occurres
we have more information than now.


Kindly regards,
Andreas




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D5EEA773-1F0F-4FA0-A39A-486EE323907D>