Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Aug 2011 10:01:42 +0200
From:      Johannes Dieterich <dieterich.joh@googlemail.com>
To:        Matthew Economou <mxeconomou@gmail.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: "panic: mutex pf task mtx owned at /usr/src/sys/contrib/pf/net/if_pfsync.c:3163"
Message-ID:  <CABquGzXWaw5A%2Bk7w4G6oKovJm-fqxoSpELdQ-O7r_SHZnN7Wkw@mail.gmail.com>
In-Reply-To: <CAC1zzctC=m1wpgiO-hEaftk4q-AR3uyzgyf2hG1X5gdZvE8LuQ@mail.gmail.com>
References:  <CAC1zzctC=m1wpgiO-hEaftk4q-AR3uyzgyf2hG1X5gdZvE8LuQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Matthew,

On Fri, Aug 26, 2011 at 11:51 PM, Matthew Economou <mxeconomou@gmail.com>wrote:

> I recently upgraded a firewall I'm using for performance testing from
> a March-ish 9-CURRENT to 9.0-BETA1 (csup run August 21 around 12:00 AM
> EDT).  It's basically a GENERIC kernel with debugging disabled and
> things like IPsec and ALTQ enabled.  Since the upgrade, after
> approximately an hour after it boots, the firewall stops passing any
> traffic (IPv4 and IPv6).  OpenVPN, for example, logs the following
> errors:
>
>  write UDPv4: Operation not permitted (code=1)
>
> Quagga, for another example, logs something similar:
>
>  ripd[1696]: can't send packet : Operation not permitted0
>  ospfd[1702]: *** sendmsg in ospf_write failed to 172.30.0.3, id 0,
> off 0, len 76, interface tap0 mtu 1500: Operation not permitted
>
> If I try to ping something from the console, I get the same error message:
>
>  # ping 4.2.2.2
>  ping: sendto: Operation not permitted
> It appears that PF isn't removing any entries from the state table.
> Note that the state table size is at its default of 10000 (which
> correlates to the amount of memory installed on the firewall - 256
> MB).
>
> State Table                          Total             Rate
>  current entries                    10013
>  searches                          554801           13.4/s
>  inserts                            10013            0.2/s
>  removals                               0            0.0/s
>
> I've tried both my current (unmodified and working prior to the
> upgrade) and experimental PF configurations, neither of which have any
> effect on the problem.  Reloading the PF configuration (/etc/rc.d/pf
> reload) or restarting PF altogether (/etc/rc.d/pf restart) also have
> no effect.  Only if I shut down PF completely (/etc/rc.d/pf stop) do I
> regain network connectivity - I can do things like ping hosts (IPv4
> and IPv6), browse the web, and pass traffic that's just routed through
> the firewall (i.e., not requiring NAT).  Clearing the state table
> (pfsync -F state) has no effect.
>
> The kernel I'm was running had debugging disabled for performance
> testing purposes, so I booted a proper debug kernel.  It panicked in
> pfsync_send_plus as soon as init enabled PF (backtrace included
> below).
>
> Starting pflog.
> pflog0: promiscuous mode enabled
> Aug 25 20:54:21 pflogd[1611]: [priv]: msg PRIV_OPEN_LOG received
> Enabling pfpanic: mutex pf task mtx owned at
> /usr/src/sys/contrib/pf/net/if_pfsync.c:3163
> cpuid = 0
> KDB: enter: panic
> [ thread pid 1619 tid 100053 ]
> Stopped at      kdb_enter+0x3a: movl    $0,kdb_why
> db> bt
> Tracing pid 1619 tid 100053 td 0xc23da2e0
> kdb_enter(c09777c9,c09777c9,c0975d7b,c6fd79e0,0,...) at kdb_enter+0x3a
> panic(c0975d7b,c0946080,c0944e87,c5b,c6fd7a0c,...) at panic+0x134
> _mtx_assert(c0a1b388,0,c0944e87,c5b,c6fd7a24,...) at _mtx_assert+0x127
> pfsync_send_plus(c6fd7a24,18,10,ad6,1000000,...) at pfsync_send_plus+0xf2
> pfsync_clear_states(a218d664,c236fb78,c0945f1c,635,c09ae167,...) at
> pfsync_clear_states+0x8d
> pfioctl(c22a0800,c0cc4412,c236fb00,3,c23da2e0,...) at pfioctl+0x1b90
> devfs_ioctl_f(c23ce578,c0cc4412,c236fb00,c216ce80,c23da2e0,...) at
> devfs_ioctl_f+0x10b
> kern_ioctl(c23da2e0,3,c0cc4412,c236fb00,1fd7cec,...) at kern_ioctl+0x21d
> ioctl(c23da2e0,c6fd7cec,c6fd7d28,c097d93a,0,...) at ioctl+0x134
> syscallenter(c23da2e0,c6fd7ce4,c6fd7ce4,0,0,...) at syscallenter+0x263
> syscall(c6fd7d28) at syscall+0x34
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281e6263, esp =
> 0xbfbfe8ac, ebp = 0xbfbfe998 ---
> db>
>
> I'm at a loss as to how to proceed.  Is this a known problem with PF?
> Can anyone suggest a work-around?
>
I ran into the same problem on my NAT-box. According to ome other PRs
(kern/159390 and kern/158873) the problem lies in pfsync. The (still open)
PRs contain both a patch (didn't test that) and the assumption that pulling
a newer version to FreeBSD might help. Also, they suggest as a workaround to
not compile "device pfsync" into the kernel. Seems to work for me (not yet
tested in detail though), don't know if this is a feasable solution for you.

Hope this helps

Johannes



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABquGzXWaw5A%2Bk7w4G6oKovJm-fqxoSpELdQ-O7r_SHZnN7Wkw>