From owner-freebsd-stable@FreeBSD.ORG Sat Jan 22 22:29:01 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0448216A4CE for ; Sat, 22 Jan 2005 22:29:01 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8436143D2D for ; Sat, 22 Jan 2005 22:29:00 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id j0MMSh5B036648; Sat, 22 Jan 2005 17:28:43 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)j0MMShhl036645; Sat, 22 Jan 2005 22:28:43 GMT (envelope-from robert@fledge.watson.org) Date: Sat, 22 Jan 2005 22:28:43 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Artem Kuchin In-Reply-To: <002801c500d1$6a711d50$0c00a8c0@artem> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: Cannot build kernel with options WITNESS X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jan 2005 22:29:01 -0000 On Sun, 23 Jan 2005, Artem Kuchin wrote: > > On Sat, 22 Jan 2005, Artem Kuchin wrote: > > > >> I cvssed just an hour ago. 5.3-STABLE and cannot build kernel with > >> WITNES. It complains: > > > > This occurs when building WITNESS without DDB in the kernel, which was not > > a tested build case when I added "show alllocks", and apparently is a > > relatively uncommon configuration as you're the first person to bump into > > it. I've just committed the fix as subr_witness.c:1.187 in HEAD, and > > subr_witness.c:1.178.2.4 in RELENG_5. Please let me know if this doesn't > > fix the problem for you. > > It fixed the problem. I am actually stuggling with unpredictable weird > lock ups, when the host can be pinged but i cannot connect via > ssh/telnet or httpd or anything else. It happens w/o any visible reason. > I am running several jails with mysql and apache in each and canot make > the whole system stable yet. This is typically a sign of one of two problems: - The system is live locked due to very high load, so the ithread, netisrs, etc, in the kernel run fine, but user processes don't get a chance to run. - The system is dead locked due to user space processes getting wedged on common locks, but the kernel ithreads and netisrs can keep on responding. I generally assume that it's a deadlock as opposed to a live lock. I'd compile a kernel with DDB, KDB, WITNESS, and BREAK_TO_DEBUGGER. When the system appears to wedge, break into the debugger using a console or serial break (FYI: serial break is more reliable, and you get the benefit of being able to easily copy and paste debugging output using the serial console for DDB). Use "show alllocks" and "show lockedvnods" to examine most of the system's locking state. Changes are, either all the interesting processes are stacked up on VFS or VM locks, since those kinds of deadlocks produce the exact symptoms you describe: ping works fine because it only hits the netisr, but when you open TCP connections, the sshd (etc) block on VM or VFS locks attempting to fork new children or access a file in the file system name space. At first, the TCP connections will establish but there will be no application data; after a bit, they will not even return a SYN/ACK because the listen queue for the listen socket has filled. Robert N M Watson