From owner-freebsd-net@FreeBSD.ORG Sat Oct 9 17:51:56 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A1E8416A4CF; Sat, 9 Oct 2004 17:51:56 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E58E43D41; Sat, 9 Oct 2004 17:51:56 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i99HoS19010881; Sat, 9 Oct 2004 13:50:28 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i99HoSdP010878; Sat, 9 Oct 2004 13:50:28 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 9 Oct 2004 13:50:28 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Kris Kennaway In-Reply-To: <20041009033900.GA6751@xor.obsecurity.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@FreeBSD.org cc: net@FreeBSD.org Subject: Re: Infinite loop in tcp_output on RELENG_5 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Oct 2004 17:51:56 -0000 On Fri, 8 Oct 2004, Kris Kennaway wrote: > pointyhat (SMP machine running RELENG_5) has twice in the past 2 days > gone into an infinite loop in the tcp_output() function (repeatedly > breaking into DDB and continuing, I can see it at different points in > the code). I made tcp_output keep a counter and increment when it hits > the again: label. If the counter reaches 1000, it panics. This > happened again just now: There is a small but non-zero chance that the commit I just made to tcp_output.c to add some missing locking around socket buffer accesses might affect (fix?) this problem. The change was tcp_output.c:1.103, if you want to give it a spin. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research > panic: Looping in tcp_output > cpuid = 0 > KDB: enter: panic > [thread 100043] > Stopped at kdb_enter+0x30: leave > db> tr > kdb_enter(c06de69a,0,c06e973a,ebbd5ba0,c34cd4b0) at kdb_enter+0x30 > panic(c06e973a,0,ebbd5b68,0,0) at panic+0x14e > tcp_output(c395f8c0,c395f8c0,c3ed3e10,c05a79f0,ebbd5ca0) at tcp_output+0x19e > tcp_drop(c395f8c0,3c,c06e9fe7,1ab,e) at tcp_drop+0x30 > tcp_timer_persist(c395f8c0,0,c06df6ba,f5,0) at tcp_timer_persist+0x14c > softclock(0,0,c06dc037,269,c0738ac0) at softclock+0x1c8 > ithread_loop(c345d800,ebbd5d48,c06dbe2a,323,41531744) at ithread_loop+0x172 > fork_exit(c04f1210,c345d800,ebbd5d48) at fork_exit+0xc6 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xebbd5d7c, ebp = 0 --- > > This might be related to SACK, which is one of the situations where we > loop back to the again label, but that's just a guess.