From owner-freebsd-stable@FreeBSD.ORG Wed Aug 27 09:06:17 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EE73106567B for ; Wed, 27 Aug 2008 09:06:17 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id 100E38FC1B for ; Wed, 27 Aug 2008 09:06:16 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by QMTA01.emeryville.ca.mail.comcast.net with comcast id 7LkB1a00C0b6N64A1LqHTA; Wed, 27 Aug 2008 08:50:17 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA03.emeryville.ca.mail.comcast.net with comcast id 7LqG1a00F4v8bD78PLqG4N; Wed, 27 Aug 2008 08:50:17 +0000 X-Authority-Analysis: v=1.0 c=1 a=h72Kl2LuVcYA:10 a=9S86shZ_OmUA:10 a=QycZ5dHgAAAA:8 a=H93ocKUuUrH4hUB6Z3QA:9 a=uuPGv2XtWdDiZMvJJnEA:7 a=asGKH0QyxTvx13tTUNUQO83uSg4A:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 61BAC17B81A; Wed, 27 Aug 2008 01:50:16 -0700 (PDT) Date: Wed, 27 Aug 2008 01:50:16 -0700 From: Jeremy Chadwick To: Antony Mawer Message-ID: <20080827085016.GA75552@icarus.home.lan> References: <48B51003.4060207@mawer.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48B51003.4060207@mawer.org> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: stable@freebsd.org Subject: Re: Finding which GEOM provider is generating errors in a graid3 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2008 09:06:17 -0000 On Wed, Aug 27, 2008 at 06:27:47PM +1000, Antony Mawer wrote: > I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which > consists of 5x 320gb SATA hard drives. I've been getting errors in > /var/log/messages from the graid3 volume, which I suspect means an > underlying fault with one of the disks, but is there any way to decipher > which one of these drives is throwing errors? > > I've checked smartctl -a /dev/adXX but nothing shows up there.. When you say "nothing shows up there", what exactly do you mean? A lot of people don't know how to read SMART statistics. I hope by "nothing shows up there" you mean "nothing stands out" > I'm wondering if this is the infamous ata driver bug(s) that may be > rearing its ugly head.. The bugs in question only apply when there's kernel messages coming from the *disks themselves*, and not a GEOM provider. Your below dmesg doesn't indicate there's any ATA errors, just GEOM errors. If the disks were failing, you *would* be getting errors from the ATA subsystem, but you're not. I'm not familiar with GEOM "stuff", so I can't really comment on what all is going on here. > Also, does anyone know what "ZoneXXFailed" items in the graid3 list > output mean? > > Relevant output: > > $ graid3 status Name Status Components raid3/data1 COMPLETE ad12 > ad14 ad16 ad18 ad20 > > $ graid3 list Geom name: data1 State: COMPLETE Components: 5 Flags: > VERIFY GenID: 0 SyncID: 1 ID: 3700500186 Zone64kFailed: 791239 > Zone64kRequested: 49197268 Zone16kFailed: 40204 Zone16kRequested: > 1283738 Zone4kFailed: 12005939 Zone4kRequested: 2445799003 Providers: > 1. Name: raid3/data1 Mediasize: 1280291731456 (1.2T) Sectorsize: 2048 > Mode: r1w1e1 ... > > $ atacontrol list ... ATA channel 6: Master: ad12 > Serial ATA v1.0 ATA channel 7: Master: ad14 Serial > ATA v1.0 ATA channel 8: Master: ad16 Serial ATA > v1.0 ATA channel 9: Master: ad18 Serial ATA v1.0 > ATA channel 10: Master: ad20 Serial ATA v1.0 > > > Output in /var/log/messages: > >> Aug 27 17:17:27 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 7 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 22 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 21 times Aug 27 17:38:24 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:38:26 backup last message repeated >> 4 times Aug 27 17:46:02 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 7 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 22 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:53:49 backup last message repeated >> 21 times -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |