Date: Mon, 29 Aug 2011 10:01:42 +0200 From: Johannes Dieterich <dieterich.joh@googlemail.com> To: Matthew Economou <mxeconomou@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: "panic: mutex pf task mtx owned at /usr/src/sys/contrib/pf/net/if_pfsync.c:3163" Message-ID: <CABquGzXWaw5A%2Bk7w4G6oKovJm-fqxoSpELdQ-O7r_SHZnN7Wkw@mail.gmail.com> In-Reply-To: <CAC1zzctC=m1wpgiO-hEaftk4q-AR3uyzgyf2hG1X5gdZvE8LuQ@mail.gmail.com> References: <CAC1zzctC=m1wpgiO-hEaftk4q-AR3uyzgyf2hG1X5gdZvE8LuQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Matthew, On Fri, Aug 26, 2011 at 11:51 PM, Matthew Economou <mxeconomou@gmail.com>wrote: > I recently upgraded a firewall I'm using for performance testing from > a March-ish 9-CURRENT to 9.0-BETA1 (csup run August 21 around 12:00 AM > EDT). It's basically a GENERIC kernel with debugging disabled and > things like IPsec and ALTQ enabled. Since the upgrade, after > approximately an hour after it boots, the firewall stops passing any > traffic (IPv4 and IPv6). OpenVPN, for example, logs the following > errors: > > write UDPv4: Operation not permitted (code=1) > > Quagga, for another example, logs something similar: > > ripd[1696]: can't send packet : Operation not permitted0 > ospfd[1702]: *** sendmsg in ospf_write failed to 172.30.0.3, id 0, > off 0, len 76, interface tap0 mtu 1500: Operation not permitted > > If I try to ping something from the console, I get the same error message: > > # ping 4.2.2.2 > ping: sendto: Operation not permitted > It appears that PF isn't removing any entries from the state table. > Note that the state table size is at its default of 10000 (which > correlates to the amount of memory installed on the firewall - 256 > MB). > > State Table Total Rate > current entries 10013 > searches 554801 13.4/s > inserts 10013 0.2/s > removals 0 0.0/s > > I've tried both my current (unmodified and working prior to the > upgrade) and experimental PF configurations, neither of which have any > effect on the problem. Reloading the PF configuration (/etc/rc.d/pf > reload) or restarting PF altogether (/etc/rc.d/pf restart) also have > no effect. Only if I shut down PF completely (/etc/rc.d/pf stop) do I > regain network connectivity - I can do things like ping hosts (IPv4 > and IPv6), browse the web, and pass traffic that's just routed through > the firewall (i.e., not requiring NAT). Clearing the state table > (pfsync -F state) has no effect. > > The kernel I'm was running had debugging disabled for performance > testing purposes, so I booted a proper debug kernel. It panicked in > pfsync_send_plus as soon as init enabled PF (backtrace included > below). > > Starting pflog. > pflog0: promiscuous mode enabled > Aug 25 20:54:21 pflogd[1611]: [priv]: msg PRIV_OPEN_LOG received > Enabling pfpanic: mutex pf task mtx owned at > /usr/src/sys/contrib/pf/net/if_pfsync.c:3163 > cpuid = 0 > KDB: enter: panic > [ thread pid 1619 tid 100053 ] > Stopped at kdb_enter+0x3a: movl $0,kdb_why > db> bt > Tracing pid 1619 tid 100053 td 0xc23da2e0 > kdb_enter(c09777c9,c09777c9,c0975d7b,c6fd79e0,0,...) at kdb_enter+0x3a > panic(c0975d7b,c0946080,c0944e87,c5b,c6fd7a0c,...) at panic+0x134 > _mtx_assert(c0a1b388,0,c0944e87,c5b,c6fd7a24,...) at _mtx_assert+0x127 > pfsync_send_plus(c6fd7a24,18,10,ad6,1000000,...) at pfsync_send_plus+0xf2 > pfsync_clear_states(a218d664,c236fb78,c0945f1c,635,c09ae167,...) at > pfsync_clear_states+0x8d > pfioctl(c22a0800,c0cc4412,c236fb00,3,c23da2e0,...) at pfioctl+0x1b90 > devfs_ioctl_f(c23ce578,c0cc4412,c236fb00,c216ce80,c23da2e0,...) at > devfs_ioctl_f+0x10b > kern_ioctl(c23da2e0,3,c0cc4412,c236fb00,1fd7cec,...) at kern_ioctl+0x21d > ioctl(c23da2e0,c6fd7cec,c6fd7d28,c097d93a,0,...) at ioctl+0x134 > syscallenter(c23da2e0,c6fd7ce4,c6fd7ce4,0,0,...) at syscallenter+0x263 > syscall(c6fd7d28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281e6263, esp = > 0xbfbfe8ac, ebp = 0xbfbfe998 --- > db> > > I'm at a loss as to how to proceed. Is this a known problem with PF? > Can anyone suggest a work-around? > I ran into the same problem on my NAT-box. According to ome other PRs (kern/159390 and kern/158873) the problem lies in pfsync. The (still open) PRs contain both a patch (didn't test that) and the assumption that pulling a newer version to FreeBSD might help. Also, they suggest as a workaround to not compile "device pfsync" into the kernel. Seems to work for me (not yet tested in detail though), don't know if this is a feasable solution for you. Hope this helps Johannes
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABquGzXWaw5A%2Bk7w4G6oKovJm-fqxoSpELdQ-O7r_SHZnN7Wkw>