Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Mar 2011 09:02:43 +1000
From:      Stephen McKay <mckay@freebsd.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-fs@freebsd.org, Stephen McKay <mckay@freebsd.org>
Subject:   Re: Constant minor ZFS corruption 
Message-ID:  <201103102302.p2AN2hNB002016@dungeon.home>
In-Reply-To: <4D7788D9.50808@sentex.net> from Mike Tancsa at "Wed, 09 Mar 2011 09:04:09 -0500"
References:  <201103081425.p28EPQtM002115@dungeon.home> <BEBC15BA440AB24484C067A3A9D38D7E014DA66584F0@server7.acsi.ca> <201103091241.p29CfUM1003302@dungeon.home> <4D7788D9.50808@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, 9th March 2011, Mike Tancsa wrote:

>On 3/9/2011 7:41 AM, Stephen McKay wrote:
>> Of the 12 disks, only 1 has been error-free.  I've been doing this for
>> about 10 days now and there is no pattern that I can see in the errors.

>After adding a larger case for future expansion, we found the next day
>we were seeing all sorts of random errors
>
>Like
>
>Mar  3 05:34:47 offsite kernel: ad1: FAILURE - WRITE_DMA48
>status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=2281852580
>
>and
>
>Mar  4 08:56:15 offsite kernel: siisch1: siis_timeout is 00040000 ss
>04000000 rs 04000000 es 00000000 sts 801e2000 serr 00000000

Our system does not report any driver errors or disk errors.  We see
checksum errors from ZFS (mostly in scrubs).  It's like there's an
invisible pixie sprinkling bad data on our disks while we sleep.

>We narrowed it down to 2 problems.  Failing / Marginal power supply and
>bad SATA cables. After changing the power supply, we still had a few
>disks errors.

If either of these were the cause of our problem, we'd see errors
logged, right?  Not just invisible corruption?

We will probably swap the power supply and cables anyway soon, just to
see what happens, but on other machines where cables or power was the
problem I saw errors (just like yours) in the logs.

>After almost 5 days of uptime, no problems at all now.  Not one error.

Well, we've got something to aim for, eh? :-)

Cheers,

Stephen.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201103102302.p2AN2hNB002016>