From owner-freebsd-stable@FreeBSD.ORG Fri Jul 11 11:48:39 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1C78106567E for ; Fri, 11 Jul 2008 11:48:39 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smtp-out1.tiscali.nl (smtp-out1.tiscali.nl [195.241.79.176]) by mx1.freebsd.org (Postfix) with ESMTP id 824898FC1B for ; Fri, 11 Jul 2008 11:48:39 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from [212.123.145.58] (helo=guido.klop.ws) by smtp-out1.tiscali.nl with smtp id 1KHH77-0002m8-R0 for ; Fri, 11 Jul 2008 13:48:37 +0200 Received: (qmail 3646 invoked from network); 11 Jul 2008 11:48:34 -0000 Received: from localhost (HELO guido.klop.ws) (127.0.0.1) by localhost with SMTP; 11 Jul 2008 11:48:34 -0000 Date: Fri, 11 Jul 2008 13:48:33 +0200 To: "Jo Rhett" , "FreeBSD Stable" From: "Ronald Klop" Content-Type: text/plain; format=flowed; delsp=yes; charset=us-ascii MIME-Version: 1.0 References: Content-Transfer-Encoding: 7bit Message-ID: In-Reply-To: User-Agent: Opera Mail/9.51 (FreeBSD) Cc: Subject: Re: how to get more logging from GEOM? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2008 11:48:39 -0000 On Fri, 11 Jul 2008 09:59:33 +0200, Jo Rhett wrote: > About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- > RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 > root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given network > traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy disk > isn't enough to do the job, as backups proceed without problems. Heavy > cpu by itself isn't enough to do it either. But if I start compiling > things and keep going a while, it will eventually hang. > > My best guess is that geom is having a problem and locking up. There's > no log entry before failure to back this idea up, but I think this > because during boot I see the following: > > ad0: 286168MB at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two > weeks. > > Is this any way to get more logging from geom, to confirm or deny this > theory? > > Is there anything else I should be looking at? > > FWIW, this never happened before the p11 patch to 6.2. I don't know if > that is related or not. > > Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the > system. > > No, I don't have any other insights. I'm not prone to posting "duh help > me please!" posts, so I'm quite a bit frustrated by this one. You can try going into the kernel debugger to see where it is hanging. Debugging via a serial cable is also very easy. I don't know the details, but there is a lot of info in the Freebsd handbook. Put this in google 'freebsd handbook kernel debug'. Ronald.