From owner-freebsd-geom@FreeBSD.ORG  Mon Oct 15 14:17:21 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5E56016A418;
	Mon, 15 Oct 2007 14:17:21 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 310B813C467;
	Mon, 15 Oct 2007 14:17:20 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net
	[209.163.168.124]) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l9FEGRLq005947;
	Mon, 15 Oct 2007 09:16:30 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-ID: <47137634.1010703@freebsd.org>
Date: Mon, 15 Oct 2007 09:16:20 -0500
From: Eric Anderson <anderson@freebsd.org>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: d_elbracht <d_elbracht@ecngs.de>
References: <008801c80e65$47cbe650$639049d9@EC1a> <feu58o$5uo$1@ger.gmane.org>
	<00cb01c80f04$50b11ed0$639049d9@EC1a>
In-Reply-To: <00cb01c80f04$50b11ed0$639049d9@EC1a>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: 'Ivan Voras' <ivoras@freebsd.org>, freebsd-geom@freebsd.org
Subject: Re: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400,
 length=8192)]error = 5
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2007 14:17:21 -0000

d_elbracht wrote:
>>> we are trying to diagnose errors seen on 6.2, SMP, amd64, 
>> cvsup'ed of
>>> 2007-10-09
>>>
>>> Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x 
>>> Opteron 2216, da3 is on a 3ware 9550-12
>>>
>>> we are seeing this error:
>>> g_vfs_done():da3s1a[READ(offset=81064794762854400, 
>> length=8192)]error 
>>> = 5 on a 12 GB Hyperdrive
>>>
>>> the offset changes sometimes, but it is always 
>> 81064794xxxxxxxxx and 
>>> well out the 12GB range.
>> Yes.
>>
>>> According to systat -vm, da3 does tps > 500 (yes, that's a lot)
>> That's not a lot :) That's actually low for a modern solid 
>> state drive.
>>
>>> This leads to an assumption, the error has to do with very high IOs 
>>> per second on a SMP machine.
>> Either that or file system errors. Does fsck run ok or does 
>> it say anything unusual?
>>
>> There are several theoretical reasons for such errors that 
>> are connected with the fact you use solid state drives, but 
>> all are tricky to diagnose if you don't have a certain 
>> repeatable test you can try. For example:
>> some SSDs optimize writes to "spread out" the IO on the 
>> chips, but some do it by looking into file system structures 
>> to determine where it's safe to relocate the write - 
>> obviously this works only with a known and supported file 
>> system. This is a really wild guess, but maybe the SSD 
>> firmware has error somewhere in this area, trying to 
>> interpret UFS as it was FAT? If you manage to get a 
>> repeatable failure test, you can try formatting the drive as 
>> FAT32 and trying it on that.

Solid state drives don't behave much differently that a regular drive 
from FreeBSD's point of view.  The huge difference most people notice is 
that they perform best at their page size (or maybe what the SSD 
manufacturer might call a block size, which is not a sector size), which 
is often 128K or 256K.  IO smaller than the page size suffers a big 
penalty since most SSD devices do not have a cache onboard (although 
some do now).

>> Or maybe it's just a bad drive...

I doubt it's a bad device..

>>> The system-disk is a RAID1 on an ICP 5805. All other disks 
>> (51) are 20 
>>> gstripe'd partitions.
>> 51 drives and 20 partitions?
>>
> According to the manufaturer, the drive handles any filesystem. In other
> words, it's as transparent as any harddisk would be.
> Also, as written before, we have seen the error=5 with weird offsets on an
> md (memory disk) before too.
> fsck on the disk does NOT show any error.
> 
> yes, 20 partitions on the other 51 disks (/dev/stripe/data ..datann). That's
> for hashfeed from diablo.
> 
> One basic question to ask: where does the value for offset= in g_vfs_done()
> come from ? 
>>From the time the error shows up in syslog I believe, the error only
> happens, when a file get's appended.

I wonder if (wild guess follows) there's a 32/64 bit conversion problem 
somewhere, like a 32bit number cast as 64bit or something.

I'd like to see a full trace to see what path it takes.  Maybe putting a 
  panic in the error path would be worth doing.

Eric