From owner-freebsd-stable@FreeBSD.ORG Fri Jul 11 09:59:23 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E04A1065677 for ; Fri, 11 Jul 2008 09:59:23 +0000 (UTC) (envelope-from jhary@unsane.co.uk) Received: from unsane.co.uk (unsane-pt.tunnel.tserv5.lon1.ipv6.he.net [IPv6:2001:470:1f08:110::2]) by mx1.freebsd.org (Postfix) with ESMTP id 76D628FC18 for ; Fri, 11 Jul 2008 09:59:22 +0000 (UTC) (envelope-from jhary@unsane.co.uk) Received: from prawn.unsane.co.uk (150.117-84-212.staticip.namesco.net [212.84.117.150]) (authenticated bits=0) by unsane.co.uk (8.14.0/8.14.0) with ESMTP id m6B9xcJF013678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jul 2008 10:59:40 +0100 (BST) (envelope-from jhary@unsane.co.uk) Message-ID: <48772EE5.2060401@unsane.co.uk> Date: Fri, 11 Jul 2008 10:59:01 +0100 From: Vince Hoffman User-Agent: Thunderbird 2.0.0.14 (X11/20080609) MIME-Version: 1.0 To: Jo Rhett References: In-Reply-To: X-Enigmail-Version: 0.95.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Stable Subject: Re: how to get more logging from GEOM? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2008 09:59:23 -0000 Jo Rhett wrote: > About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD > 6.2-RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 > root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given network > traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy disk > isn't enough to do the job, as backups proceed without problems. Heavy > cpu by itself isn't enough to do it either. But if I start compiling > things and keep going a while, it will eventually hang. > > My best guess is that geom is having a problem and locking up. There's > no log entry before failure to back this idea up, but I think this > because during boot I see the following: > > ad0: 286168MB at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two weeks. > > Is this any way to get more logging from geom, to confirm or deny this > theory? Just a guess but try kern.geom.debugflags > 0 This certainly spews out far more geom info, as to how helpful this will be... Vince > > Is there anything else I should be looking at? > > FWIW, this never happened before the p11 patch to 6.2. I don't know if > that is related or not. > > Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the > system. > > No, I don't have any other insights. I'm not prone to posting "duh help > me please!" posts, so I'm quite a bit frustrated by this one. >