From owner-freebsd-stable@FreeBSD.ORG Mon Oct 15 08:21:06 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6461616A46B; Mon, 15 Oct 2007 08:21:06 +0000 (UTC) (envelope-from d_elbracht@ecngs.de) Received: from ecngs.de (mail.ecngs.de [217.73.144.50]) by mx1.freebsd.org (Postfix) with ESMTP id 7547B13C447; Mon, 15 Oct 2007 08:21:04 +0000 (UTC) (envelope-from d_elbracht@ecngs.de) Received: from EC1a (ec1.elbracht.net [217.73.144.99]) by ecngs.de (SurgeMail 3.8f2) with ESMTP id 1774348-1922481 for multiple; Mon, 15 Oct 2007 10:21:26 +0200 From: "d_elbracht" To: "'Ivan Voras'" , References: <008801c80e65$47cbe650$639049d9@EC1a> Date: Mon, 15 Oct 2007 10:20:57 +0200 Message-ID: <00cb01c80f04$50b11ed0$639049d9@EC1a> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 Thread-Index: AcgOsevpOahtmKUeQKG7YhTDqm4A3wATlmcA In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Cc: freebsd-geom@freebsd.org Subject: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2007 08:21:06 -0000 > > we are trying to diagnose errors seen on 6.2, SMP, amd64, > cvsup'ed of > > 2007-10-09 > > > > Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x > > Opteron 2216, da3 is on a 3ware 9550-12 > > > > we are seeing this error: > > g_vfs_done():da3s1a[READ(offset=81064794762854400, > length=8192)]error > > = 5 on a 12 GB Hyperdrive > > > > the offset changes sometimes, but it is always > 81064794xxxxxxxxx and > > well out the 12GB range. > > Yes. > > > According to systat -vm, da3 does tps > 500 (yes, that's a lot) > > That's not a lot :) That's actually low for a modern solid > state drive. > > > This leads to an assumption, the error has to do with very high IOs > > per second on a SMP machine. > > Either that or file system errors. Does fsck run ok or does > it say anything unusual? > > There are several theoretical reasons for such errors that > are connected with the fact you use solid state drives, but > all are tricky to diagnose if you don't have a certain > repeatable test you can try. For example: > some SSDs optimize writes to "spread out" the IO on the > chips, but some do it by looking into file system structures > to determine where it's safe to relocate the write - > obviously this works only with a known and supported file > system. This is a really wild guess, but maybe the SSD > firmware has error somewhere in this area, trying to > interpret UFS as it was FAT? If you manage to get a > repeatable failure test, you can try formatting the drive as > FAT32 and trying it on that. > > Or maybe it's just a bad drive... > > > The system-disk is a RAID1 on an ICP 5805. All other disks > (51) are 20 > > gstripe'd partitions. > > 51 drives and 20 partitions? > According to the manufaturer, the drive handles any filesystem. In other words, it's as transparent as any harddisk would be. Also, as written before, we have seen the error=5 with weird offsets on an md (memory disk) before too. fsck on the disk does NOT show any error. yes, 20 partitions on the other 51 disks (/dev/stripe/data ..datann). That's for hashfeed from diablo. One basic question to ask: where does the value for offset= in g_vfs_done() come from ? >From the time the error shows up in syslog I believe, the error only happens, when a file get's appended. Dieter