From owner-freebsd-fs@FreeBSD.ORG Wed Jan 9 16:26:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1438AFDD; Wed, 9 Jan 2013 16:26:16 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 45A17B4E; Wed, 9 Jan 2013 16:26:15 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YhG1K4jHHz7ySD; Wed, 9 Jan 2013 17:26:13 +0100 (CET) Date: Wed, 9 Jan 2013 17:26:13 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130109162613.GA34276@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2013 16:26:16 -0000 * Artem Belevich [2013-01-08 12:47 -0800]: > On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky > wrote: > > NAME STATE READ WRITE CKSUM > > pool1 DEGRADED 0 0 0 > > raidz2-0 DEGRADED 0 0 0 > > ada5 ONLINE 0 0 0 > > ada8 ONLINE 0 0 0 > > ada2 ONLINE 0 0 0 > > ada3 ONLINE 0 0 0 > > 11846390416703086268 UNAVAIL 0 0 0 was /dev/dsk/ada1 > > ada6 ONLINE 0 0 0 > > ada0 ONLINE 0 0 1 > > ada7 ONLINE 0 0 0 > > ada4 ONLINE 0 0 3 > > You seem to have some checksum errors which does suggest hardware troubles. I somehow missed these. Is there any way to learn when these checksum errors happen? > For starters, check smart info for all drives and see if they have any > relocated sectors. There are some disks with relocated sectors, but for both ada0 and ada4 Reallocated_Sector_Ct is 0. > Use gstat during your workload to see if any of the drives takes much > longer than others to handle its job. There is one disk sticking out a bit. > > There is almost no disk activity during this time. > > What kind of disk activity *is* there? What would be interesting? > > sync is disabled for the whole pool. > > If that's the case (assyming you're talking about sync=disabled zfs > property), then synchronous writes are probably not the cause of > slowdown. My guess would be either failing HDD or something funky with > cabling or sata controller. Yes, sync=disabled for pool1. Ok, I will start swapping hardware (sadly the machine is quite a drive away). Thank you very much for your help. Nicolas -- http://www.rachinsky.de/nicolas