From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 10:58:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EC964B73 for ; Mon, 14 Jan 2013 10:58:41 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by mx1.freebsd.org (Postfix) with ESMTP id 76FA2307 for ; Mon, 14 Jan 2013 10:58:41 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id fj20so3698736lab.10 for ; Mon, 14 Jan 2013 02:58:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type; bh=3yQl+K7AqRR3ky63oHeeLRDoNxK7bqfmNqdZMlfWg7A=; b=aPwmF7lQ5mB91NH3WXS65qcErYVyy3+SLnc2ULfHf1rw737Cd1IuIpkAOgYSprsTue D8N65oOp3pwUok8Pqb3tB5Q0WKhiA/ficfk/I19mR1rpnFiIAPXqOvBXz39T/OscblCl I0HyXQHsZGKttymMkvmCbsyUZw4O6T4IDSzqulCYUqzZTg9XhWc96/zWaCBPVOvf86pN vBIc0SbkZz+pLNVKoFbt9+KFLFme3e702y5DW3LjNtzkQV9gvrw0B6Oy7Vcr3pXShRGW /HYDFw/tXxO3JOiK4Y0cJk2kOFnhaHskYaJCO4gpv7NpJsQNCnoOpMxRmbQLxelt7Hzt KhfQ== X-Received: by 10.152.145.8 with SMTP id sq8mr80768163lab.21.1358161120058; Mon, 14 Jan 2013 02:58:40 -0800 (PST) Received: from [192.168.50.105] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id ox6sm5029619lab.16.2013.01.14.02.58.38 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Jan 2013 02:58:39 -0800 (PST) Message-ID: <50F3E4DC.8030704@gmail.com> Date: Mon, 14 Jan 2013 11:58:36 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Nicolas Rachinsky Subject: Re: slowdown of zfs (tx->tx) References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> In-Reply-To: <20130114094010.GA75529@mid.pc5.i.0x5.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 10:58:42 -0000 Nicolas Rachinsky schreef: > * Artem Belevich [2013-01-11 12:39 -0800]: >> On Thu, Jan 10, 2013 at 11:34 PM, Nicolas Rachinsky >> wrote: >>> * Nicolas Rachinsky [2013-01-10 20:39 +0100]: >>>> after replacing one of the controllers, all problems seem to have >>>> disappeared. Thank you very much for your advice! >>> Now the problem is back. >>> >>> After changing the controller, there were no more timeouts logged. >>> >>> No UDMA_CRC_Error_Count changed. >>> >> Is there anything special about ada8? It does seem to have noticeably >> higher service time compared to other disks. > Nothing I know of. The disks are Samsung HD103UJ and HD103SI, multiple > of each type. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 073 073 011 Pre-fail Always - 8890 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32 > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 > 7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10872 > 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5688 > 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31 > 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 > 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 > 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 > 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 078 069 000 Old_age Always - 22 (Min/Max 21/25) > 194 Temperature_Celsius 0x0022 077 067 000 Old_age Always - 23 (Min/Max 21/26) > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x000a 100 099 000 Old_age Always - 5 > 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 > > Reallocated_Sector_Ct did not increase during the last days. > > >> Cound you do gstat with 1-second interval. Some of the 5-second >> samples show that ada8 is the bottleneck -- it has its request queue >> full (L(q)=10) when all other drives were done with their jobs. And >> that's a 5-sec average. Its write service time also seems to be a lot >> higher than for other drives. > Attached. I have replace ada8 by ada9, which is a Western Digital > Caviar Black. > > Now ada0 and ada4 seem to be the bottleneck. > > But I don't understand the intervalls without any disk activity. > >> Does the drive have its write cache disabled by any chance? That could >> explain why it takes so much longer to service writes. > No, camcontrol identify says it's enabled. > >> Can you remove ada8 and see if your performance go back to normal? > The problem still persists. > > Thank you for your help! > > Nicolas > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Could it be that something else is occupying the pool. I had to disable a security check from periodic. daily_status_security_neggrpperm_enable="NO" After i disabled that check, my pool was performing normal again. If you do not have many snapshots, it is no problem, but with a lot of snashots, this check stalls the pool. gr Johan