From owner-freebsd-stable@FreeBSD.ORG Sun Jan 7 14:00:54 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8B33216A407 for ; Sun, 7 Jan 2007 14:00:54 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 40EF413C44B for ; Sun, 7 Jan 2007 14:00:54 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 79E3E4BA8B; Sun, 7 Jan 2007 09:00:53 -0500 (EST) Date: Sun, 7 Jan 2007 14:00:53 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Frode Nordahl In-Reply-To: <0491A255-404B-4802-851C-43F4691C19E2@nordahl.net> Message-ID: <20070107135754.N46119@fledge.watson.org> References: <0491A255-404B-4802-851C-43F4691C19E2@nordahl.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: Livelock in 6.2-RC1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 14:00:54 -0000 On Sat, 6 Jan 2007, Frode Nordahl wrote: > I am experiencing a rare livelock on four of my backend mail servers running > 6.1-STABLE, 6.2-BETA2 and 6.2-RC1. They are running OpenLDAP slapd, postfix > and UW-IMAPD. > > The servers can run for months without any problem, but nevertheless I have > experienced this problem on multiple versions and different hardware > configurations about 5 times since september / october 2006. > > Server is responding to pings, but all other activity halts. > > On one occasion when one of the servers displayed this behaviour it managed > to recover from the situation by itself after being gone for 20-30 minutes. Recovery is a sign of possible livelock, but otherwise this description sounds more like deadlock than livelock. Note that deadlock can be in a specific subsystem, so other services may still keep running -- for example, interrupts and the in-bound network stack generally have no interaction with the file system, so a file system deadlock can leave ping and the keyboard working. The first step in diagnosing both livelock and deadlock is to figure out what the system is actually doing. I'd start out with the following commands: show pcpu show allpcpu trace alltrace ps show lockedvnods show locks show alllocks (The last two won't work unless you have WITNESS compiled in). The fact that you can get into the debugger and run debugging commands is a good sign; the fact that the debugger breaks into the idle thread suggests that the system has at least one idle CPU. Robert N M Watson Computer Laboratory University of Cambridge > > Typical hardware configuration: > CPU 2x Xeon 3.06GHz or 1x Core2Duo 2.00GHz (SMP) > RAM 4 GB RAM > DISK Intel SRCU42X (amr) or Dell PERC 5/i (mfi) > > Kernel config: > include GENERIC > options KDB # Enable kernel debugger support. > options BREAK_TO_DEBUGGER > options DDB # Support DDB. > options GDB # Support remote GDB. > options QUOTA > options SMP > > On the last crash i collected the following info from DDB: > db> tr > Tracing pid 11 tid 100005 td 0xc8f90780 > kdb_enter(c092f08b) at kdb_enter+0x2b > siointr1(c9120800) at siointr1+0xce > siointr(c9120800) at siointr+0x5e > intr_execute_handlers(c8f864c8,e7b14c94,4,e7b14cd8,c0889503,...) at > intr_execute_handlers+0xe1 > lapic_handle_intr(3d) at lapic_handle_intr+0x2e > Xapic_isr1() at Xapic_isr1+0x33 > --- interrupt, eip = 0xc0b5b0e5, esp = 0xe7b14cd8, ebp = 0xe7b14cd8 --- > acpi_cpu_c1(0,0,e7b14cf8,c8f90780,1,...) at acpi_cpu_c1+0x5 > acpi_cpu_idle(e7b14d10,c066a779,c8f8fa78,c066a6e4,e7b14d24,...) at > acpi_cpu_idle+0x152 > cpu_idle(c8f8fa78,c066a6e4,e7b14d24,c066a465,0,...) at cpu_idle+0x28 > idle_proc(0,e7b14d38) at idle_proc+0x95 > fork_exit(c066a6e4,0,e7b14d38) at fork_exit+0x71 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xe7b14d6c, ebp = 0 --- > > > db> show lockedbufs > buf at 0xdd08cbd0 > b_flags = 0x20000000 > b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0 > b_bufobj = (0xc937ed80), b_data = 0xdea14000, b_blkno = 14386688 > b_npages = 4, pages(OBJ, IDX, PA): (0xc1045210, 0x1b70c0, > 0xdbe35000),(0xc1045210, 0x1b70c1, 0xc17d6000),(0xc1045210, 0x1b70c2, > 0x582d7000),(0xc1045210, 0x1b70c3, 0x84498000) > > I have a crashdump or two available for further investigation. > > -- > Frode Nordahl > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"