Date: Mon, 14 Jan 2013 11:58:36 +0100 From: Johan Hendriks <joh.hendriks@gmail.com> To: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> Cc: freebsd-fs@freebsd.org Subject: Re: slowdown of zfs (tx->tx) Message-ID: <50F3E4DC.8030704@gmail.com> In-Reply-To: <20130114094010.GA75529@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <CAFqOu6jgA8RWV5d%2BrOBk8D=3Vu3yWSnDkAi1cFJ0esj4OpBy2Q@mail.gmail.com> <20130109162613.GA34276@mid.pc5.i.0x5.de> <CAFqOu6jrng=v8eVyhqV-PBqJM_dYy%2BU7X4%2B=ahBeoxvK4mxcSA@mail.gmail.com> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <CAFqOu6gWpMsWN0pTBiv10WfwyGWMfO9GzMLWTtcVxHixr-_i3Q@mail.gmail.com> <20130114094010.GA75529@mid.pc5.i.0x5.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Nicolas Rachinsky schreef: > * Artem Belevich <art@freebsd.org> [2013-01-11 12:39 -0800]: >> On Thu, Jan 10, 2013 at 11:34 PM, Nicolas Rachinsky >> <fbsd-mas-0@ml.turing-complete.org> wrote: >>> * Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org> [2013-01-10 20:39 +0100]: >>>> after replacing one of the controllers, all problems seem to have >>>> disappeared. Thank you very much for your advice! >>> Now the problem is back. >>> >>> After changing the controller, there were no more timeouts logged. >>> >>> No UDMA_CRC_Error_Count changed. >>> >> Is there anything special about ada8? It does seem to have noticeably >> higher service time compared to other disks. > Nothing I know of. The disks are Samsung HD103UJ and HD103SI, multiple > of each type. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 073 073 011 Pre-fail Always - 8890 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32 > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 > 7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10872 > 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5688 > 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31 > 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 > 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 > 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 > 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 078 069 000 Old_age Always - 22 (Min/Max 21/25) > 194 Temperature_Celsius 0x0022 077 067 000 Old_age Always - 23 (Min/Max 21/26) > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x000a 100 099 000 Old_age Always - 5 > 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 > > Reallocated_Sector_Ct did not increase during the last days. > > >> Cound you do gstat with 1-second interval. Some of the 5-second >> samples show that ada8 is the bottleneck -- it has its request queue >> full (L(q)=10) when all other drives were done with their jobs. And >> that's a 5-sec average. Its write service time also seems to be a lot >> higher than for other drives. > Attached. I have replace ada8 by ada9, which is a Western Digital > Caviar Black. > > Now ada0 and ada4 seem to be the bottleneck. > > But I don't understand the intervalls without any disk activity. > >> Does the drive have its write cache disabled by any chance? That could >> explain why it takes so much longer to service writes. > No, camcontrol identify says it's enabled. > >> Can you remove ada8 and see if your performance go back to normal? > The problem still persists. > > Thank you for your help! > > Nicolas > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Could it be that something else is occupying the pool. I had to disable a security check from periodic. daily_status_security_neggrpperm_enable="NO" After i disabled that check, my pool was performing normal again. If you do not have many snapshots, it is no problem, but with a lot of snashots, this check stalls the pool. gr Johan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50F3E4DC.8030704>