From owner-freebsd-fs@FreeBSD.ORG Thu Jan 10 01:15:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 19A46D1E for ; Thu, 10 Jan 2013 01:15:10 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by mx1.freebsd.org (Postfix) with ESMTP id BC7C1BEC for ; Thu, 10 Jan 2013 01:15:09 +0000 (UTC) Received: by mail-vc0-f175.google.com with SMTP id fy7so11226vcb.34 for ; Wed, 09 Jan 2013 17:15:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=70AnbqTwsr2q9Ii5UgMCwEpb+lVF3L/hwszMhJ7rhEo=; b=FjCacVSvwSDK/meqWm9UAr9OWWc0sr2Bc1IfToQOsOQnZh91nCfIcxbJA/uk4mTd1j Y5CCgvZci5ozxvIh9CASh+TwhzS7F3JIc0lPOVxnrRWS6E2MLxJyFwODe141n5X+Yn7N hco3Eb68WucwP2UYYqNxm4XK3+FaPOSNKZjTrS5EZQfHlOdJZs+VpG2njxYOJoB1rTpr fyIz3w8TgF8aPKTcsxE6orvYLALntrdHjQKlXK7/2UStqEe1Es9c4E8dA9CXEm030N5I yEUfNe9nS2nwYo18mUH1mkkHKf5QgwTFe7av2idPWfakKdzeYnYvSoRMhRJI4Mz/Rr5H jwYA== MIME-Version: 1.0 Received: by 10.58.181.42 with SMTP id dt10mr8114476vec.34.1357780503255; Wed, 09 Jan 2013 17:15:03 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Wed, 9 Jan 2013 17:15:03 -0800 (PST) In-Reply-To: <20130109162613.GA34276@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> Date: Wed, 9 Jan 2013 17:15:03 -0800 X-Google-Sender-Auth: nJi9-6qAoP3R4o4DWalhMlF9Doo Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2013 01:15:10 -0000 On Wed, Jan 9, 2013 at 8:26 AM, Nicolas Rachinsky wrote: > * Artem Belevich [2013-01-08 12:47 -0800]: >> On Tue, Jan 8, 2013 at 9:42 AM, Nicolas Rachinsky >> wrote: >> > NAME STATE READ WRITE CKSUM >> > pool1 DEGRADED 0 0 0 >> > raidz2-0 DEGRADED 0 0 0 >> > ada5 ONLINE 0 0 0 >> > ada8 ONLINE 0 0 0 >> > ada2 ONLINE 0 0 0 >> > ada3 ONLINE 0 0 0 >> > 11846390416703086268 UNAVAIL 0 0 0 was /dev/dsk/ada1 >> > ada6 ONLINE 0 0 0 >> > ada0 ONLINE 0 0 1 >> > ada7 ONLINE 0 0 0 >> > ada4 ONLINE 0 0 3 >> >> You seem to have some checksum errors which does suggest hardware troubles. > > I somehow missed these. Is there any way to learn when these checksum > errors happen? Not on FreeBSD (yet) as far as I can tell. Not explicitly, anyways. Check /var/log/messages for any indications of SATA errors. There's a good chance that there was a timeout at some point. >> For starters, check smart info for all drives and see if they have any >> relocated sectors. > > There are some disks with relocated sectors, but for both ada0 and > ada4 Reallocated_Sector_Ct is 0. Are there any UDMA errors? Those would suggest trouble with cabling. >> Use gstat during your workload to see if any of the drives takes much >> longer than others to handle its job. > > There is one disk sticking out a bit. In a raid-z pool number of transactions/second is determined by the slowest disk. Check ms/w column. Look for numbers substantially higher than typical seek rate (10..20ms is OK, 100 is not). > >> > There is almost no disk activity during this time. >> >> What kind of disk activity *is* there? > > What would be interesting? Drives 'sticking out' being busy longer than their peers in the pool. Excessive ms/r or ms/w in gstat. Unexpected reads or writes. --Artem