From owner-freebsd-fs@FreeBSD.ORG Thu Jan 10 19:39:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9EACEDF1; Thu, 10 Jan 2013 19:39:58 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 025163D4; Thu, 10 Jan 2013 19:39:58 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YhyGF6vqPz7ySH; Thu, 10 Jan 2013 20:39:49 +0100 (CET) Date: Thu, 10 Jan 2013 20:39:49 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130110193949.GA10023@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2013 19:39:58 -0000 Hallo, after replacing one of the controllers, all problems seem to have disappeared. Thank you very much for your advice! > >> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky > >> wrote: * Artem Belevich [2013-01-09 17:15 -0800]: > On Wed, Jan 9, 2013 at 8:26 AM, Nicolas Rachinsky > wrote: > > * Artem Belevich [2013-01-08 12:47 -0800]: > >> You seem to have some checksum errors which does suggest hardware troubles. > > > > I somehow missed these. Is there any way to learn when these checksum > > errors happen? > > Not on FreeBSD (yet) as far as I can tell. Not explicitly, anyways. > Check /var/log/messages for any indications of SATA errors. There's a > good chance that there was a timeout at some point. There is an UDMA_CRC_Error_Count of 17 and 20 for the two disks with checksum errors. The other disks have values between 0 and 5. And yes, there have been timeouts some time ago. Since the problem did occur without the timeout occuring again, I considered the timeouts to be unrelated. And then I forgot them. :( But shouldn't timeouts either produce correct data after a retry or a read/write error otherwise? Nicolas -- http://www.rachinsky.de/nicolas