From owner-freebsd-stable@FreeBSD.ORG Mon Jan 23 08:53:55 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4758E16A41F for ; Mon, 23 Jan 2006 08:53:55 +0000 (GMT) (envelope-from mse_software@charter.net) Received: from mxsf30.cluster1.charter.net (mxsf30.cluster1.charter.net [209.225.28.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id AFC5243D45 for ; Mon, 23 Jan 2006 08:53:54 +0000 (GMT) (envelope-from mse_software@charter.net) Received: from mxip02a.cluster1.charter.net (mxip02a.cluster1.charter.net [209.225.28.132]) by mxsf30.cluster1.charter.net (8.12.11/8.12.11) with ESMTP id k0N8rrrX010644 for ; Mon, 23 Jan 2006 03:53:53 -0500 Received: from 68-113-23-60.dhcp.knwk.wa.charter.com (HELO yak.mseubanks.net) ([68.113.23.60]) by mxip02a.cluster1.charter.net with ESMTP; 23 Jan 2006 03:53:53 -0500 X-IronPort-AV: i="4.01,210,1136178000"; d="scan'208"; a="1857319761:sNHT20858752" From: "Michael S. Eubanks" To: freebsd-stable@freebsd.org In-Reply-To: <44B2CAEF-A9E7-454B-A232-292B58083952@stromnet.org> References: <991F35AA-151B-4AEA-82BD-5F4AEDF28424@stromnet.org> <74994962-5050-47BD-897B-DE3880B9EBD5@stromnet.org> <1132353600.903.19.camel@genius1.i.cz> <20051118231351.GA46946@holestein.holy.cow> <1132356649.903.32.camel@genius1.i.cz> <8A4DAD5D-44CF-42DD-A113-340226284533@stromnet.org> <268C3DEB-7569-4C18-BC35-1C5F36EF8EC4@stromnet.org> <1137967081.40786.36.camel@yak.mseubanks.net> <1DA0C9DF-BB42-415B-8851-FFB91CD0F1AC@stromnet.org> <1137975447.40786.83.camel@yak.mseubanks.net> <44B2CAEF-A9E7-454B-A232-292B58083952@stromnet.org> Content-Type: text/plain; charset=ISO-8859-1 Date: Mon, 23 Jan 2006 00:53:51 -0800 Message-Id: <1138006431.44108.15.camel@yak.mseubanks.net> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 FreeBSD GNOME Team Port Content-Transfer-Encoding: 8bit Subject: Re: Page fault, GEOM problem?? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: mse_software@charter.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2006 08:53:55 -0000 On Mon, 2006-01-23 at 06:43 +0100, Johan Ström wrote: > On 23 jan 2006, at 01.17, Michael S. Eubanks wrote: > > > > On Sun, 2006-01-22 at 23:51 +0100, Johan Ström wrote: > > > > ...snip... > > > > > >> On 22 jan 2006, at 22.58, Michael S. Eubanks wrote: > >> This card does afaik dont have raid functionalitys (I've never read > >> anything about it either on the web, the cards box or anywhere > >> else..). > >> I'm running GENERIC, which does include ataraid.. > >> What does your dmesg identify your card as? > >> > >> atapci0: port 0xb800-0xb87f, > >> 0xb400-0xb4ff mem 0xfb800000-0xfb800fff,0xfb000000-0xfb01ffff irq 19 > >> at device 12.0 on pci0 > >> > >> Is it the same PDC chipset? > >> > >> -- > >> Johan > >> > >> > > > > No, I have a different controller. My mistake. I think what is > > happening is the DMA read command is failing, therefore causing the > > device to be disconnected, and the kernel can't write to the disk from > > that point on (this is somewhat obvious given the output below). > > > > > >>> Nov 29 20:36:54 elfi kernel: subdisk10: detached > >>> Nov 29 20:36:54 elfi kernel: ad10: detached > >>> Nov 29 20:36:54 elfi kernel: unknown: TIMEOUT - READ_DMA48 retrying > >>> (1 retry left) LBA=426562704 > >>> Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Device gm0s1: provider > >>> ad10s1 disconnected. > >>> > > > > The message seen from the last line above is generated in any of the > > following scenarios (from g_mirror.c): > > 1. Device wasn't running yet, but disk disappear. > > 2. Disk was active and disapppear. > > 3. Disk disappear during synchronization process. > > > > > >>> Nov 29 20:36:54 elfi kernel: GEOM_MIRROR: Request failed (error=6). > >>> ad10s1[WRITE(offset=134356992, length=16384)] > >>> > > > > As far as recovering the disk, I remember seeing something about > > booting > > to single user mode and using fsck after a core dump in a previous > > post. > > I'm assuming the disks worked initially and that you were able to > > label > > them etc? Is there any possibility that the disk state may be altered > > by a power saving feature or setting in the BIOS and FreeBSD just > > doesn't know when it happens until the next time it tries to access > > the > > disk? > > > > For recovering, i've always done a direct reboot, the gmirror > rebuilds the mirror and fsck is run. > No problems reading labels etc, and never has been, only problem has > been these sporadic crashes.. And the read/write performance (see > earlier in thread)... > > This is a server, so all bios setting for powersaving is (should be) > shut of. Bios should thus never make the disk go to sleep. > > Thanks for trying to help! Wish I could be of more help. :) Have you tried to toggle the sysctl dma flags? I've seen similar posts in the past with read timeouts caused from dma being enabled. # sysctl -a | grep dma ... hw.ata.ata_dma: 1 <=== Try turning this one off (1 ==> 0). hw.ata.atapi_dma: 1 ... -Michael