Date: Mon, 20 Aug 2007 16:34:51 GMT From: Scott Ullrich <sullrich@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/115651: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2 Message-ID: <200708201634.l7KGYpXU009720@www.freebsd.org> Resent-Message-ID: <200708201640.l7KGe1Km016374@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 115651 >Category: kern >Synopsis: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2 >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Aug 20 16:40:01 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Scott Ullrich >Release: RELENG_6_2 >Organization: pfSense >Environment: FreeBSD pfsense.geekgod.com 6.2-RELEASE-p7 FreeBSD 6.2-RELEASE-p7 #0: Sat Aug 4 18:35:24 EDT 2007 sullrich@builder6.pfsense.com:/usr/obj.pfSense/usr/src/sys/pfSense.6 i386 >Description: Frequently racoon (ipsec-tools 0.7rc1 and also 0.6) will deadlock into the sbwait state or will enter a 100% cpu usage state and will not recover without killing the process and restarting. ipsec-tools 0.67 will enter the state "sbwait" upon triggering the issue whereas ipsec-tools 0.7rc1 will enter a 100% tailspin. Backtrace during this condition: #0 0x2827a187 in recvfrom () from /lib/libc.so.6 #1 0x28225904 in recv () from /lib/libc.so.6 #2 0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826 #3 0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314 #4 0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1) at isakmp_inf.c:1173 #5 0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1) at isakmp_inf.c:565 #6 0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0, local=0xbfbfe770) at isakmp.c:671 #7 0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395 #8 0x0804bf88 in session () at session.c:223 #9 0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264 #0 0x2827a187 in recvfrom () from /lib/libc.so.6 #1 0x28225904 in recv () from /lib/libc.so.6 #2 0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826 #3 0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314 #4 0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1) at isakmp_inf.c:1173 #5 0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1) at isakmp_inf.c:565 #6 0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0, local=0xbfbfe770) at isakmp.c:671 #7 0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395 #8 0x0804bf88 in session () at session.c:223 #9 0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264 I found this email which refers to the exact same issue I am running into. http://mail-index.netbsd.org/tech-net/2003/09/11/0015.html The index to the thread is here. Subject "Reminder that we are supporting two parallel IPsec". http://mail-index.netbsd.org/tech-net/2003/09/ It looks like a feud between netbsd developers. And from the it appears as though netbsd and freebsd share the same pfkey interface issue. What follow is a political discussion on the list about right and wrong. And people get flak for choosing something to work around the pfkey issue. I think this post gives a really good summary of the problem. http://mail-index.netbsd.org/tech-net/2003/09/12/0036.html Further down a thread starts with the subject "Problems with PF_KEY SADB_DUMP". This thread begins with a thorough summary of the issues. http://mail-index.netbsd.org/tech-net/2003/09/19/0001.html Interestingly though I find this text: <-- * There is a genuine bug in the KAME PF_KEY, which has also been faithfully copied in fast-ipsec (NetBSD and FreeBSD): if a process requesting an SADB_DUMP and the kernel fills the requesting so_rcv queue, KAME fails to place an error indication in the last-delivered packet. (that's why racoon hangs in sbwait(): it is waiting to read another SADB_DUMP message). KAME setkey has a kludge to avoid the bug: it does a setsockopt() with SO_RCVTIMEO, and in the loop to read subsequent SADB_DUMP respsones, setkey interpretes a subsequent EAGAIN as a sign to abort the loop. IMNSO, that's not up to the standards to which NetBSD code aspires. A more correct fix is to have the sendup code check whether additional SADB_DUMP messages are required; if more are required, and there isn't space for at least one more (in addition to the current message) then set sadb_msg_errno to (e.g.) ENOBUFS, to indicate the SADB_DUMP responses are truncated at that message. --> >How-To-Repeat: Install ipsec-tools. Setup with a large number of tunnels. In this case we are up to 85 tunnels. >Fix: No known fix as of yet. Need to kill ipsec-tools and restart to get it working again. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200708201634.l7KGYpXU009720>