From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 15 08:21:06 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6461616A46B;
	Mon, 15 Oct 2007 08:21:06 +0000 (UTC)
	(envelope-from d_elbracht@ecngs.de)
Received: from ecngs.de (mail.ecngs.de [217.73.144.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 7547B13C447;
	Mon, 15 Oct 2007 08:21:04 +0000 (UTC)
	(envelope-from d_elbracht@ecngs.de)
Received: from EC1a (ec1.elbracht.net [217.73.144.99]) 
	by ecngs.de (SurgeMail 3.8f2) with ESMTP id 1774348-1922481 
	for multiple; Mon, 15 Oct 2007 10:21:26 +0200
From: "d_elbracht" <d_elbracht@ecngs.de>
To: "'Ivan Voras'" <ivoras@freebsd.org>,
	<freebsd-stable@freebsd.org>
References: <008801c80e65$47cbe650$639049d9@EC1a> <feu58o$5uo$1@ger.gmane.org>
Date: Mon, 15 Oct 2007 10:20:57 +0200
Message-ID: <00cb01c80f04$50b11ed0$639049d9@EC1a>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AcgOsevpOahtmKUeQKG7YhTDqm4A3wATlmcA
In-Reply-To: <feu58o$5uo$1@ger.gmane.org>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Cc: freebsd-geom@freebsd.org
Subject: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400,
	length=8192)]error = 5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2007 08:21:06 -0000

> > we are trying to diagnose errors seen on 6.2, SMP, amd64, 
> cvsup'ed of
> > 2007-10-09
> > 
> > Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x 
> > Opteron 2216, da3 is on a 3ware 9550-12
> > 
> > we are seeing this error:
> > g_vfs_done():da3s1a[READ(offset=81064794762854400, 
> length=8192)]error 
> > = 5 on a 12 GB Hyperdrive
> > 
> > the offset changes sometimes, but it is always 
> 81064794xxxxxxxxx and 
> > well out the 12GB range.
> 
> Yes.
> 
> > According to systat -vm, da3 does tps > 500 (yes, that's a lot)
> 
> That's not a lot :) That's actually low for a modern solid 
> state drive.
> 
> > This leads to an assumption, the error has to do with very high IOs 
> > per second on a SMP machine.
> 
> Either that or file system errors. Does fsck run ok or does 
> it say anything unusual?
> 
> There are several theoretical reasons for such errors that 
> are connected with the fact you use solid state drives, but 
> all are tricky to diagnose if you don't have a certain 
> repeatable test you can try. For example:
> some SSDs optimize writes to "spread out" the IO on the 
> chips, but some do it by looking into file system structures 
> to determine where it's safe to relocate the write - 
> obviously this works only with a known and supported file 
> system. This is a really wild guess, but maybe the SSD 
> firmware has error somewhere in this area, trying to 
> interpret UFS as it was FAT? If you manage to get a 
> repeatable failure test, you can try formatting the drive as 
> FAT32 and trying it on that.
> 
> Or maybe it's just a bad drive...
> 
> > The system-disk is a RAID1 on an ICP 5805. All other disks 
> (51) are 20 
> > gstripe'd partitions.
> 
> 51 drives and 20 partitions?
> 
According to the manufaturer, the drive handles any filesystem. In other
words, it's as transparent as any harddisk would be.
Also, as written before, we have seen the error=5 with weird offsets on an
md (memory disk) before too.
fsck on the disk does NOT show any error.

yes, 20 partitions on the other 51 disks (/dev/stripe/data ..datann). That's
for hashfeed from diablo.

One basic question to ask: where does the value for offset= in g_vfs_done()
come from ? 
>From the time the error shows up in syslog I believe, the error only
happens, when a file get's appended.

Dieter