From owner-freebsd-bugs@FreeBSD.ORG Mon Aug 20 16:40:02 2007 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A4F016A469 for ; Mon, 20 Aug 2007 16:40:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2916213C480 for ; Mon, 20 Aug 2007 16:40:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l7KGe27h016375 for ; Mon, 20 Aug 2007 16:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l7KGe1Km016374; Mon, 20 Aug 2007 16:40:01 GMT (envelope-from gnats) Resent-Date: Mon, 20 Aug 2007 16:40:01 GMT Resent-Message-Id: <200708201640.l7KGe1Km016374@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Scott Ullrich Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 047B816A421 for ; Mon, 20 Aug 2007 16:34:52 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id E428B13C442 for ; Mon, 20 Aug 2007 16:34:51 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.1/8.14.1) with ESMTP id l7KGYpMj009722 for ; Mon, 20 Aug 2007 16:34:51 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.1/8.14.1/Submit) id l7KGYpXU009720; Mon, 20 Aug 2007 16:34:51 GMT (envelope-from nobody) Message-Id: <200708201634.l7KGYpXU009720@www.freebsd.org> Date: Mon, 20 Aug 2007 16:34:51 GMT From: Scott Ullrich To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/115651: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 16:40:02 -0000 >Number: 115651 >Category: kern >Synopsis: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2 >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Aug 20 16:40:01 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Scott Ullrich >Release: RELENG_6_2 >Organization: pfSense >Environment: FreeBSD pfsense.geekgod.com 6.2-RELEASE-p7 FreeBSD 6.2-RELEASE-p7 #0: Sat Aug 4 18:35:24 EDT 2007 sullrich@builder6.pfsense.com:/usr/obj.pfSense/usr/src/sys/pfSense.6 i386 >Description: Frequently racoon (ipsec-tools 0.7rc1 and also 0.6) will deadlock into the sbwait state or will enter a 100% cpu usage state and will not recover without killing the process and restarting. ipsec-tools 0.67 will enter the state "sbwait" upon triggering the issue whereas ipsec-tools 0.7rc1 will enter a 100% tailspin. Backtrace during this condition: #0 0x2827a187 in recvfrom () from /lib/libc.so.6 #1 0x28225904 in recv () from /lib/libc.so.6 #2 0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826 #3 0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314 #4 0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1) at isakmp_inf.c:1173 #5 0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1) at isakmp_inf.c:565 #6 0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0, local=0xbfbfe770) at isakmp.c:671 #7 0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395 #8 0x0804bf88 in session () at session.c:223 #9 0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264 #0 0x2827a187 in recvfrom () from /lib/libc.so.6 #1 0x28225904 in recv () from /lib/libc.so.6 #2 0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826 #3 0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314 #4 0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1) at isakmp_inf.c:1173 #5 0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1) at isakmp_inf.c:565 #6 0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0, local=0xbfbfe770) at isakmp.c:671 #7 0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395 #8 0x0804bf88 in session () at session.c:223 #9 0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264 I found this email which refers to the exact same issue I am running into. http://mail-index.netbsd.org/tech-net/2003/09/11/0015.html The index to the thread is here. Subject "Reminder that we are supporting two parallel IPsec". http://mail-index.netbsd.org/tech-net/2003/09/ It looks like a feud between netbsd developers. And from the it appears as though netbsd and freebsd share the same pfkey interface issue. What follow is a political discussion on the list about right and wrong. And people get flak for choosing something to work around the pfkey issue. I think this post gives a really good summary of the problem. http://mail-index.netbsd.org/tech-net/2003/09/12/0036.html Further down a thread starts with the subject "Problems with PF_KEY SADB_DUMP". This thread begins with a thorough summary of the issues. http://mail-index.netbsd.org/tech-net/2003/09/19/0001.html Interestingly though I find this text: <-- * There is a genuine bug in the KAME PF_KEY, which has also been faithfully copied in fast-ipsec (NetBSD and FreeBSD): if a process requesting an SADB_DUMP and the kernel fills the requesting so_rcv queue, KAME fails to place an error indication in the last-delivered packet. (that's why racoon hangs in sbwait(): it is waiting to read another SADB_DUMP message). KAME setkey has a kludge to avoid the bug: it does a setsockopt() with SO_RCVTIMEO, and in the loop to read subsequent SADB_DUMP respsones, setkey interpretes a subsequent EAGAIN as a sign to abort the loop. IMNSO, that's not up to the standards to which NetBSD code aspires. A more correct fix is to have the sendup code check whether additional SADB_DUMP messages are required; if more are required, and there isn't space for at least one more (in addition to the current message) then set sadb_msg_errno to (e.g.) ENOBUFS, to indicate the SADB_DUMP responses are truncated at that message. --> >How-To-Repeat: Install ipsec-tools. Setup with a large number of tunnels. In this case we are up to 85 tunnels. >Fix: No known fix as of yet. Need to kill ipsec-tools and restart to get it working again. >Release-Note: >Audit-Trail: >Unformatted: