Date: Fri, 11 Jul 2008 13:48:33 +0200 From: "Ronald Klop" <ronald-freebsd8@klop.yi.org> To: "Jo Rhett" <hostmaster@netconsonance.com>, "FreeBSD Stable" <freebsd-stable@freebsd.org> Subject: Re: how to get more logging from GEOM? Message-ID: <op.ud4lq7fo8527sy@guido.klop.ws> In-Reply-To: <C278655C-4FFB-4A8E-9501-2B84283E324D@netconsonance.com> References: <C278655C-4FFB-4A8E-9501-2B84283E324D@netconsonance.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Jul 2008 09:59:33 +0200, Jo Rhett <hostmaster@netconsonance.com> wrote: > About 10 days ago one of my personal machines started hanging at > random. This is the first bit of instability I've ever experienced on > this machine (2+ years running) > > FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- > RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 > root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 > > After about 2 weeks of watching it carefully I've learned almost > nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > running healthd without complaints) it's not based on any given network > traffic... however it does appear to accompany heavy cpu/disk > activity. It usually dies when indexing my websites at night (but not > always) and it sometimes dies when compiling programs. Just heavy disk > isn't enough to do the job, as backups proceed without problems. Heavy > cpu by itself isn't enough to do it either. But if I start compiling > things and keep going a while, it will eventually hang. > > My best guess is that geom is having a problem and locking up. There's > no log entry before failure to back this idea up, but I think this > because during boot I see the following: > > ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100 > GEOM_MIRROR: Device gm0 created (id=575427344). > GEOM_MIRROR: Device gm0: provider ad0 detected. > ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100 > GEOM_MIRROR: Device gm0: provider ad1 detected. > GEOM_MIRROR: Device gm0: provider ad1 activated. > GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. > GEOM_MIRROR: Device gm0: rebuilding provider ad0. > > Every time it is rebuilding ad0. Every single boot in the last two > weeks. > > Is this any way to get more logging from geom, to confirm or deny this > theory? > > Is there anything else I should be looking at? > > FWIW, this never happened before the p11 patch to 6.2. I don't know if > that is related or not. > > Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the > system. > > No, I don't have any other insights. I'm not prone to posting "duh help > me please!" posts, so I'm quite a bit frustrated by this one. You can try going into the kernel debugger to see where it is hanging. Debugging via a serial cable is also very easy. I don't know the details, but there is a lot of info in the Freebsd handbook. Put this in google 'freebsd handbook kernel debug'. Ronald.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.ud4lq7fo8527sy>