From owner-freebsd-stable@FreeBSD.ORG Sat Oct 2 00:32:47 2010 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EAD6106566C for ; Sat, 2 Oct 2010 00:32:47 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 5EDA38FC08 for ; Sat, 2 Oct 2010 00:32:47 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id o920WZKG028379; Fri, 1 Oct 2010 17:32:39 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201010020032.o920WZKG028379@gw.catspoiler.org> Date: Fri, 1 Oct 2010 17:32:35 -0700 (PDT) From: Don Lewis To: avg@icyb.net.ua In-Reply-To: <201009300849.o8U8nr8r081019@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: stable@FreeBSD.org, sterling@camdensoftware.com, freebsd@jdc.parodius.com Subject: Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2010 00:32:47 -0000 On 30 Sep, Don Lewis wrote: > The silent reboots that I was seeing with WITNESS go away if I add > WITNESS_SKIPSPIN. Witness doesn't complain about anything. I've tracked down the the silent reboot problem. It happens when a userland sysctl call gets down into calcru1(), which tries to print a "calcu: .." message. Eventually sc_puts() wants to grab a spin lock, which causes a call to witness, which detects a lock order reversal. This recurses into printf(), which dives back into the console code and eventually triggers a panic. I'm still gathering the details on this and I see what I can come up with for a fix. > I tested -CURRENT and !SMP seems to work ok. One difference in terms of > hardware between the two tests is that I'm using a SATA drive when > testing -STABLE and a SCSI drive when testing -CURRENT. I'm not able to trigger the problem with -CURRENT when it is running on a SCSI drive, but I do see the freezes, long ping RTTs, and ntp insanity when running a !SMP -CURRENT kernel on my SATA drive with an 8.1-STABLE world.