From owner-freebsd-stable@FreeBSD.ORG Fri Jul 11 08:24:31 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 61A471065681 for ; Fri, 11 Jul 2008 08:24:31 +0000 (UTC) (envelope-from hostmaster@netconsonance.com) Received: from mail.netconsonance.com (mail.netconsonance.com [198.207.204.4]) by mx1.freebsd.org (Postfix) with ESMTP id 55ED38FC0A for ; Fri, 11 Jul 2008 08:24:31 +0000 (UTC) (envelope-from hostmaster@netconsonance.com) Received: from [172.16.12.8] (covad-jrhett.meer.net [209.157.140.144]) (authenticated bits=0) by mail.netconsonance.com (8.14.1/8.14.1) with ESMTP id m6B7xZeb001499 for ; Fri, 11 Jul 2008 00:59:35 -0700 (PDT) (envelope-from hostmaster@netconsonance.com) X-Virus-Scanned: amavisd-new at netconsonance.com X-Spam-Flag: NO X-Spam-Score: -1.24 X-Spam-Level: X-Spam-Status: No, score=-1.24 tagged_above=-999 required=3.5 tests=[ALL_TRUSTED=-1.44, AWL=0.200] Message-Id: From: Jo Rhett To: FreeBSD Stable Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v924) Date: Fri, 11 Jul 2008 00:59:33 -0700 X-Mailer: Apple Mail (2.924) Cc: Subject: how to get more logging from GEOM? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2008 08:24:31 -0000 About 10 days ago one of my personal machines started hanging at random. This is the first bit of instability I've ever experienced on this machine (2+ years running) FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 root@i386-builder.daemonology.net :/usr/obj/usr/src/sys/GENERIC i386 After about 2 weeks of watching it carefully I've learned almost nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now running healthd without complaints) it's not based on any given network traffic... however it does appear to accompany heavy cpu/disk activity. It usually dies when indexing my websites at night (but not always) and it sometimes dies when compiling programs. Just heavy disk isn't enough to do the job, as backups proceed without problems. Heavy cpu by itself isn't enough to do it either. But if I start compiling things and keep going a while, it will eventually hang. My best guess is that geom is having a problem and locking up. There's no log entry before failure to back this idea up, but I think this because during boot I see the following: ad0: 286168MB at ata0-master UDMA100 GEOM_MIRROR: Device gm0 created (id=575427344). GEOM_MIRROR: Device gm0: provider ad0 detected. ad1: 286168MB at ata0-slave UDMA100 GEOM_MIRROR: Device gm0: provider ad1 detected. GEOM_MIRROR: Device gm0: provider ad1 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. GEOM_MIRROR: Device gm0: rebuilding provider ad0. Every time it is rebuilding ad0. Every single boot in the last two weeks. Is this any way to get more logging from geom, to confirm or deny this theory? Is there anything else I should be looking at? FWIW, this never happened before the p11 patch to 6.2. I don't know if that is related or not. Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the system. No, I don't have any other insights. I'm not prone to posting "duh help me please!" posts, so I'm quite a bit frustrated by this one. -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness