Date: Mon, 15 Oct 2007 09:16:20 -0500 From: Eric Anderson <anderson@freebsd.org> To: d_elbracht <d_elbracht@ecngs.de> Cc: 'Ivan Voras' <ivoras@freebsd.org>, freebsd-geom@freebsd.org Subject: Re: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 Message-ID: <47137634.1010703@freebsd.org> In-Reply-To: <00cb01c80f04$50b11ed0$639049d9@EC1a> References: <008801c80e65$47cbe650$639049d9@EC1a> <feu58o$5uo$1@ger.gmane.org> <00cb01c80f04$50b11ed0$639049d9@EC1a>
next in thread | previous in thread | raw e-mail | index | archive | help
d_elbracht wrote: >>> we are trying to diagnose errors seen on 6.2, SMP, amd64, >> cvsup'ed of >>> 2007-10-09 >>> >>> Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x >>> Opteron 2216, da3 is on a 3ware 9550-12 >>> >>> we are seeing this error: >>> g_vfs_done():da3s1a[READ(offset=81064794762854400, >> length=8192)]error >>> = 5 on a 12 GB Hyperdrive >>> >>> the offset changes sometimes, but it is always >> 81064794xxxxxxxxx and >>> well out the 12GB range. >> Yes. >> >>> According to systat -vm, da3 does tps > 500 (yes, that's a lot) >> That's not a lot :) That's actually low for a modern solid >> state drive. >> >>> This leads to an assumption, the error has to do with very high IOs >>> per second on a SMP machine. >> Either that or file system errors. Does fsck run ok or does >> it say anything unusual? >> >> There are several theoretical reasons for such errors that >> are connected with the fact you use solid state drives, but >> all are tricky to diagnose if you don't have a certain >> repeatable test you can try. For example: >> some SSDs optimize writes to "spread out" the IO on the >> chips, but some do it by looking into file system structures >> to determine where it's safe to relocate the write - >> obviously this works only with a known and supported file >> system. This is a really wild guess, but maybe the SSD >> firmware has error somewhere in this area, trying to >> interpret UFS as it was FAT? If you manage to get a >> repeatable failure test, you can try formatting the drive as >> FAT32 and trying it on that. Solid state drives don't behave much differently that a regular drive from FreeBSD's point of view. The huge difference most people notice is that they perform best at their page size (or maybe what the SSD manufacturer might call a block size, which is not a sector size), which is often 128K or 256K. IO smaller than the page size suffers a big penalty since most SSD devices do not have a cache onboard (although some do now). >> Or maybe it's just a bad drive... I doubt it's a bad device.. >>> The system-disk is a RAID1 on an ICP 5805. All other disks >> (51) are 20 >>> gstripe'd partitions. >> 51 drives and 20 partitions? >> > According to the manufaturer, the drive handles any filesystem. In other > words, it's as transparent as any harddisk would be. > Also, as written before, we have seen the error=5 with weird offsets on an > md (memory disk) before too. > fsck on the disk does NOT show any error. > > yes, 20 partitions on the other 51 disks (/dev/stripe/data ..datann). That's > for hashfeed from diablo. > > One basic question to ask: where does the value for offset= in g_vfs_done() > come from ? >>From the time the error shows up in syslog I believe, the error only > happens, when a file get's appended. I wonder if (wild guess follows) there's a 32/64 bit conversion problem somewhere, like a 32bit number cast as 64bit or something. I'd like to see a full trace to see what path it takes. Maybe putting a panic in the error path would be worth doing. Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47137634.1010703>