From owner-freebsd-pf@freebsd.org Tue Aug 25 15:56:25 2015 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A39C99AF80 for ; Tue, 25 Aug 2015 15:56:25 +0000 (UTC) (envelope-from Andrej.Kolontai@verwaltung.uni-muenchen.de) Received: from mailto2.verwaltung.uni-muenchen.de (mailto2.verwaltung.uni-muenchen.de [141.84.149.7]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "IronPort Appliance Demo Certificate", Issuer "IronPort Appliance Demo Certificate" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 614886E for ; Tue, 25 Aug 2015 15:56:23 +0000 (UTC) (envelope-from Andrej.Kolontai@verwaltung.uni-muenchen.de) X-IronPort-AV: E=McAfee;i="5700,7163,7903"; a="2993810" X-IronPort-AV: E=Sophos;i="5.15,746,1432591200"; d="scan'208";a="2993810" Received: from cashts1.zuv.uni-muenchen.de ([10.153.81.103]) by smtpout2.verwaltung.uni-muenchen.de with ESMTP/TLS/AES256-SHA; 25 Aug 2015 17:55:11 +0200 Received: from MXS2.zuv.uni-muenchen.de ([fe80::e8db:cdb2:9a:a69f]) by CASHTS1.zuv.uni-muenchen.de ([::1]) with mapi id 14.03.0248.002; Tue, 25 Aug 2015 17:55:08 +0200 From: Kolontai Andrej To: "'freebsd-pf@freebsd.org'" Subject: Machine freezes when loading pf ruleset Thread-Topic: Machine freezes when loading pf ruleset Thread-Index: AdDfSL0WCcADlcE1Qv6+xfswQgISow== Date: Tue, 25 Aug 2015 15:55:08 +0000 Message-ID: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.23.107.156] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 15:56:25 -0000 Hello, I'm new to this list and I hope it's the right place to ask.=20 We have highly utilized installation of two FreeBSD-machines running 10.1-R= ELEASE, pf and carp. There are about 50 networks (some via vlan, some ipsec= ) connected to them, usually about 50000 pf states, about 1500 rules and tr= affic sometimes hitting 1 gbit/s or more.=20 When we load the ruleset on the active machine it sometimes freezes. It occ= urs more often as our installation grows. Sometimes it freezes completely, = which is actually good as the backup machine takes over. But at other times= it's still some kind of alive but unbearably slow (ping is about 40 s inst= ead of the usual 0.2 ms), the backup machine does not take over and people = start to get angry.=20 The crashed machine usually recovers after a couple of minutes.=20 We've been trying to resolve this for some months now and are running out o= f ideas. We do believe that something is wrong with or setup but have no cl= ue what it could be.=20 When it happens, the syslogs show tons of messages like these: Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.101.87= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.101.87= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.107.20= 6 to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.101.87= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.107.20= 6 to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.101.87= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.106.17= 1 to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.106.17= 1 to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.106.72= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.106.58= to 141.84.149.19 Aug 24 09:08:39 applej last message repeated 2 times Aug 24 09:08:39 applej kernel: pf: BAD state: TCP in wire: 10.200.108.141:4= 9202 172.23.2.92:8192 stack: - [lo=3D3482585550 high=3D3482585552 win=3D819= 2 modulator=3D0pf_map_addr: src tracking maps ]172.23.106.72 [lo=3D0 hi gh=3D1 win=3D1 modulator=3D0 to ]141.84.149.18 2:0 Aug 24 09:08:39 applej kernel: S seq=3D3503222902 (3503222902) ack=3D0 len= =3D0 ackskew=3D0 pkts=3D1:1 dir=3Din,fwd Aug 24 09:08:39 applej kernel: pf: State failure on: 1 | 5 =20 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.106.17= 1 to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.107.20= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps 172.23.101.87= to 141.84.149.18 Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps pf_map_addr: = src tracking maps pf_map_addr: src tracking maps 172.23.106.171172.23.106.1= 71 to 172.23.106.171141.84.149.18 to to=20 Aug 24 09:08:39 applej kernel: 141.84.149.18141.84.149.18 Aug 24 09:08:39 applej kernel:=20 Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: = src tracking maps 172.23.106.171172.23.101.87 to=20 Aug 24 09:08:40 applej kernel: to 141.84.149.18141.84.149.18 Aug 24 09:08:40 applej kernel:=20 Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.249 -> 10.153.10= 1.239 state: TCP in wire: 141.84.149.249:37677 10.153.101.239 Aug 24 09:08:40 applej kernel: :80 stack: - [lo=3D4171141930 high=3D4171159= 337 win=3D106 modulator=3D0 wscale=3D7] [lo=3D2653133771 high=3D2653147339 = win=3D136 modulator=3D0 wscale=3D7] 10:10 seq=3D2653133771 Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.249 -> 10.153.10= 1.239 state: TCP out wire: 10.153.101.239:80 141.84.149.249 Aug 24 09:08:40 applej kernel: :37677 stack: - [lo=3D4171141930 high=3D4171= 159337 win=3D106 modulator=3D0 wscale=3D7] [lo=3D2653133771 high=3D26531473= 39 win=3D136 modulator=3D0 wscale=3D7] 10:10 seq=3D2653133771 Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: = src tracking maps 172.23.106.184pf: wire key attach failed on vlan42: 172.2= 3.106.165pf_map_addr: src tracking maps=20 Aug 24 09:08:40 applej kernel: 112 to in141.84.149.18 wire: 172.23.106.58 Aug 24 09:08:40 applej kernel: 141.84.149.27 Aug 24 09:08:40 applej kernel: to 224.0.0.18 to 1:0141.84.149.18, existing= :=20 Aug 24 09:08:40 applej kernel: 112141.84.149.19 in Aug 24 09:08:40 applej kernel: wire:=20 Aug 24 09:08:40 applej kernel: 141.84.149.27=20 Aug 24 09:08:40 applej kernel: 224.0.0.18 stack: 141.84.149.27 224.0.0.18 1= :0 Aug 24 09:08:40 applej kernel: pf: wire key attach failed on vlan42: 112 in= wire: 141.84.149.1 224.0.0.18 1:0, existing:=20 Aug 24 09:08:40 applej kernel: 112 in wire: 141.84.149.1 224.0.0.18 stack: = 141.84.149.1 224.0.0.18 1:0 Aug 24 09:08:40 applej kernel:=20 Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: = src tracking maps pf: wire key attach failed on vlan250: 172.23.100.108pf: = OK ICMP 3:10 112 Aug 24 09:08:40 applej kernel: 172.23.106.184 in to wire: to 10.153.101.7= 3141.84.149.18 224.0.0.18141.84.149.18 1:0141.84.149.198, existing:=20 Aug 24 09:08:40 applej kernel: 112 in Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps=20 Aug 24 09:08:40 applej kernel:=20 Aug 24 09:08:40 applej kernel: wire: -> 10.153.101.73172.23.106.58 to 224= .0.0.18141.84.149.19 stack:=20 Aug 24 09:08:40 applej kernel: 10.153.101.73 Aug 24 09:08:40 applej kernel: 77.7.121.109 state: 224.0.0.18TCP 1:0 Aug 24 09:08:40 applej kernel: out wire: 141.84.149.198:80 77.7.121.109:525= 21 stack:=20 Aug 24 09:08:40 applej kernel: - [lo=3D2750816253 high=3D2750829820 win=3D2= 56 modulator=3D0 wscale=3D8] [lo=3D2902946358 high=3D2903011894 win=3D106 m= odulator=3D0 Aug 24 09:08:40 applej kernel: wscale=3D7] 7:9 seq=3D2750816252 Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.198 -> 77.7.121.= 109 state: TCP in wire: 77.7.121.109:52521 Aug 24 09:08:40 applej kernel: 141.84.149.198:80 stack: - [lo=3D2750816253 = high=3D2750829820 win=3D256 modulator=3D0 Aug 24 09:08:40 applej kernel: wscale=3D8] [lo=3D2902946358 high=3D29030118= 94 win=3D106 modulator=3D0 wscale=3D7] 7:9 seq=3D2750816252=20 .... Aug 24 09:09:11 applej kernel: 4:7:80:1433 seq=3D1025128833 Aug 24 09:09:11 applej kernel: 141.84.149.19:47806 stack: 10.110.32.129-:5= 0143 [lo=3D3790813207 high=3D3790831127 win=3D14600 modulator=3D0 stack: w= scale=3D8-] [lo=3D3961092366 high=3D3962143087 win=3D4106 modulator=3D0 [lo= =3D155 1251521 high=3D1554989121 win=3D17898 modulator=3D0 wscale=3D8 wscale=3D8]]= [lo=3D1005669588 high=3D1006720647 win=3D4106 modulator=3D0 4:2 wscale=3D8= seq=3D1551251520 Aug 24 09:09:11 applej kernel: ] 4:4 seq=3D1005669587 Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 172.23.254.250pf: OK ICMP 3:= 1 -> 10.110.32.220172.23.106.87 -> state: 10.110.32.193TCP state: inTCP = wire: out172.23.106.87 wire: :5017310.110.32.193 :1433141.84 .149.21 :844310.110.32.130 stack: :54454172.23.106.87 stack: :50173- [lo= =3D2704530546 high=3D2705581267 win=3D4106 modulator=3D0141.84.149.18 wscal= e=3D8:8443] [lo=3D521335749 high=3D521353924 win=3D256 modulator=3D0 [lo=3D= 19884644 01 high=3D1989515460 win=3D4106 modulator=3D0 wscale=3D8 wscale=3D8]] [lo= =3D3203135938 high=3D3203200962 win=3D71 modulator=3D0 4:4 wscale=3D8 seq= =3D1988464400 Aug 24 09:09:11 applej kernel: ] 7:4 seq=3D521335748 Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 pf: OK ICMP 3:1 pf: OK ICMP = 3:1 pf: OK ICMP 3:1 pf: OK ICMP 3:1 pf: OK ICMP 3:1 172.23.254.250172.23.25= 4.25010.110.32.220 -> -> 172.23.106.8710.110.32.193 state: =20 state: TCPTCP in out wire: wire: 172.23.106.8710.110.32.193:50168:1433 14= 1.84.149.2110.110.32.129:8443:50175 stack: stack: 172.23.106.87-:50168 [lo= =3D1504649300 high=3D1505700021 win=3D4106 modulator=3D0 wscale=3D8141 .84.149.18]:8443 [lo=3D723475428 high=3D724526487 win=3D4106 modulator=3D0 = [lo=3D975354116 high=3D975374851 win=3D256 modulator=3D0 wscale=3D8 wscale= =3D8]] 4:4 [lo=3D4251253613 high=3D4251317869 win=3D81 modulator=3D0 seq=3D= 723475427 Aug 24 09:09:11 applej kernel: wscale=3D8] 7:4 seq=3D975354115 Aug 24 09:09:11 applej kernel: 141.84.44.211 -> 172.23.254.250217.86.168.23= 6 -> state: 172.23.3.29TCP state: inTCP wire: 141.84.44.211217.86.168.236= -> :5098310.180.10.209 state: 141.84.149.198TCP:80 in stack : wire: -10.180.10.209 [lo=3D3709359805 high=3D3709359806 win=3D8192 modul= ator=3D0 in:49223] [lo=3D0 high=3D8192 win=3D1 modulator=3D0141.84.149.244= ]:443 2:0 stack: seq=3D3709359804 Aug 24 09:09:11 applej kernel: - [lo=3D827930303 high=3D827944894 win=3D67 = modulator=3D0 wire: wscale=3D8172.23.3.29]:35196 [lo=3D1796383793 high=3D1= 796400177 win=3D114 modulator=3D0 wscale=3D710.153.101.108]:6556 10:10 sta= ck: s eq=3D827930303 Aug 24 09:09:11 applej kernel: - [lo=3D1991477221 high=3D1991477222 win=3D1= 4600 modulator=3D0 -> ]172.23.101.172 [lo=3D0 high=3D14600 win=3D1 modulato= r=3D0 state: ]TCP 2:0 in seq=3D1991477220 Aug 24 09:09:11 applej kernel: wire: 172.23.101.172:49613 141.84.149.21:808= 0 stack: 172.23.101.172:49613 141.84.149.19:8080 [lo=3D3559829126 high=3D35= 59829127 win=3D8192 modulator=3D0] [lo=3D0 high=3D8192 win=3D1 modulator=3D= 0] 2:0 seq=3D3559829125 Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 pf: OK ICMP 3:1 10.110.32.22= 0141.84.44.211 -> pf_map_addr: src tracking maps -> 172.23.101.8710.110.32= .193 to 115.248.50.20141.84.149.19 state:=20 Aug 24 09:09:11 applej kernel: state: TCPTCP out in wire: wire: 10.110.32.= 193115.248.50.20:1433:50426 10.110.32.130141.84.149.229:54455:80 stack: s= tack: -- [lo=3D201188016 high=3D202238737 win=3D513 modulator=3D0 [lo =3D2382456191 high=3D2382456193 win=3D2048 modulator=3D0 wscale=3D8]] [lo= =3D0 high=3D1 win=3D1 modulator=3D0 [lo=3D3935828538 high=3D3935959789 win= =3D4106 modulator=3D0] wscale=3D8 2:0] seq=3D2382456191 Aug 24 09:09:11 applej kernel: 4:4 seq=3D3935828537 Sorry if this is too much but I really can' t tell which of these messages= could help. Perhaps there isn't any useful information at all. To me it se= ems it's just a side effect of the actual problem which I am unable to see. Maybe somebody has an idea about what's going wrong and what to investigate= . Or theories right now are: * issues with pfsync (messages complaining about states, but unlikely, it'= s normally working just fine) * some lack of ressources (yet, which?) * race condition inside the kernel (there are 24 CPU cores) We have not yet been able to reproduce it in a lab environment.=20 Right now, we work around this by putting the master into the backup state = (using demotion) before loading the ruleset and changing back to master sta= te after the reload. This is working fine. But it kills, of course, the IPS= ec-SAs. But that's something we can live with right now.=20 Viele Gr=FC=DFe=20 Andrej Kolontai Ludwig-Maximilians-Universitaet Muenchen Ref. VI.4 (IT-Sicherheit & Verzeichnisdienste)=20 Martiusstrasse 4 / 207 80802 Muenchen phone +49 (0)89 2180-3815 email mailto:andrej.kolontai@verwaltung.uni-muenchen.de web http://www.uni-muenchen.de/zuv/it/