From owner-freebsd-fs@FreeBSD.ORG Fri Mar 11 00:27:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22A521065675; Fri, 11 Mar 2011 00:27:48 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id B41598FC14; Fri, 11 Mar 2011 00:27:47 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:4433:c074:8d7b:b33d] ([IPv6:2607:f3e0:0:4:4433:c074:8d7b:b33d]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p2B0Rjbm021808 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 10 Mar 2011 19:27:46 -0500 (EST) (envelope-from mike@sentex.net) Message-ID: <4D796C7D.6080208@sentex.net> Date: Thu, 10 Mar 2011 19:27:41 -0500 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Jeremy Chadwick References: <201103081425.p28EPQtM002115@dungeon.home> <201103091241.p29CfUM1003302@dungeon.home> <4D7788D9.50808@sentex.net> <201103102302.p2AN2hNB002016@dungeon.home> <20110310234143.GA9136@icarus.home.lan> In-Reply-To: <20110310234143.GA9136@icarus.home.lan> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on IPv6:2607:f3e0:0:1::12 Cc: freebsd-fs@freebsd.org, Stephen McKay Subject: Re: Constant minor ZFS corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2011 00:27:48 -0000 On 3/10/2011 6:41 PM, Jeremy Chadwick wrote: >>> >>> Mar 3 05:34:47 offsite kernel: ad1: FAILURE - WRITE_DMA48 >>> status=51 error=10 LBA=2281852580 >>> >>> and >>> >>> Mar 4 08:56:15 offsite kernel: siisch1: siis_timeout is 00040000 ss >>> 04000000 rs 04000000 es 00000000 sts 801e2000 serr 00000000 > > Speaking strictly to Mike here: > > I spent some time a while ago trying to figure out the NID_NOT_FOUND > error. Something I wrote back when I was contributing on the Wiki; see > section "SATA disk troubleshooting": > > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > > So, it could be that the LBA being accessed isn't within the permitted > valid range. I could be completely off my rocker though; I'd need > someone much more familiar with the ATA-7 specification to state up > front what this bit actually defines. Well, not sure about the NID not found errors. If the cable is bad or the power supply is marginal, who knows what the disk thinks its getting in terms of requests? New PS and new cables took away the rrors. The only other place I have seen the NID not found error consistently is on large SANDISK CFs on Alix and Soekris boxes. Havent found a work around for that unfortunately. > > Anyway, despite that, the controller is also reporting timeouts. What > you haven't shown is what exact model of Silicon Image controller you're > using. It matters. There are certain models of SI chipsets that have > very bad, nasty bugs. Other models of chips do not have these issues: 3124 http://www.addonics.com/products/host_controller/adsa3gpx8-4e.asp. They work quite well. mav@freebsd.org wrote the drivers using this card and they have been rock solid for us so far on two heavily used nfs/smb servers. ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/