From owner-freebsd-fs@FreeBSD.ORG Mon Oct 26 01:30:39 2009 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F02B1065676 for ; Mon, 26 Oct 2009 01:30:39 +0000 (UTC) (envelope-from solon@pyro.de) Received: from srv23.fsb.echelon.bnd.org (mail.pyro.de [83.137.99.96]) by mx1.freebsd.org (Postfix) with ESMTP id 03C3D8FC14 for ; Mon, 26 Oct 2009 01:30:38 +0000 (UTC) Received: from port-87-193-183-44.static.qsc.de ([87.193.183.44] helo=flash.home) by srv23.fsb.echelon.bnd.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1N2EPq-0006XI-1T; Mon, 26 Oct 2009 02:30:37 +0100 Date: Mon, 26 Oct 2009 02:30:32 +0100 From: Solon Lutz X-Mailer: The Bat! (v3.99.25) Professional Organization: pyro.labs berlin X-Priority: 3 (Normal) Message-ID: <1791999980.20091026023032@pyro.de> To: Wes Morgan , freebsd-fs@FreeBSD.ORG In-Reply-To: References: <886802879.20091008113716@pyro.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.4 (-) X-Spam-Report: Spam detection software, running on the system "srv23.fsb.echelon.bnd.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see The administrator of that system for details. Content preview: > Did you ever get any response? I have a very similar sounding issue with > my raidz2. I've always assumed it was because the volume was nearly full > and maybe some fragmentation or something. All of my devices are on MPT > controllers, so I don't think that the highpoint device is an issue. [...] Content analysis details: (-1.4 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP X-Spam-Flag: NO Cc: Subject: Re: raidz slowing down X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Oct 2009 01:30:39 -0000 > Did you ever get any response? I have a very similar sounding issue with= =20 > my raidz2. I've always assumed it was because the volume was nearly full= =20 > and maybe some fragmentation or something. All of my devices are on MPT= =20 > controllers, so I don't think that the highpoint device is an issue. Nope, no responses... Since I was working on a rescue operation, I didn't have the patience to eliminated all kinds of errors and so I swapped out da1 (maybe a little bit slow or buggy?) and used the forensics version of dd 'dcfldd'. It has a split option and I suspected that ZFS has problems when writing huge amounts of continous data streams - so I split the 10TB in 100GB files, which took about 11 hours. I don't know if this is general problem, or if this only happens when the input id delivered at a much higher data-rate. In this case, the HW-RAID/zp= ool was able to deliver data at 600MB/s while the RAIDZ/zpool could only write at 1= 30MB/s. The dynamics of this 'slow-down' that I could watch via gstat looked like t= he whole access on the device level was desynchronizing completely. In the end, before I quit the process, write-speed was down to 5MB/s ! But as I mentioned earlier, I had no nerves for bug-hunting, due to a bigger (still unsolved) problem at hand. Maybe somebody else likes to investigate? I'm busy with ZFS forensics... solon > On Thu, 8 Oct 2009, Solon Lutz wrote: >> I built a 9x hdd 11TB raidz for some rescue purposes and started >> copying an image from another partition via "dd if=3D/dev/da0..." to it. >> It consists of: ad4 da1 da2 da3 da4 da5 da6 da7 da8, da1 to da8 are >> connected via two highpoint controllers. >> In the beginning write speeds were quite fair: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 424 0 0 0.0 424 52483 33.9 84.6| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 35 356 0 0 0.0 356 44584 76.4 124.5| da1 >> 35 296 0 0 0.0 296 36919 84.5 121.0| da2 >> 34 361 0 0 0.0 361 45111 75.5 124.7| da3 >> 35 346 0 0 0.0 346 43196 78.6 123.2| da4 >> 35 344 0 0 0.0 344 42940 80.0 124.7| da5 >> 35 343 0 0 0.0 343 42812 80.7 124.5| da6 >> 35 344 0 0 0.0 344 43051 79.8 123.9| da7 >> 34 342 0 0 0.0 342 42796 80.6 124.4| da8 >> Now, some 10 hours and 2.5TB later, it look like that most of the time: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 10 0 0 0.0 10 6 0.8 0.2| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 4 13 0 0 0.0 13 8 550.4 178.5| da1 >> 0 12 0 0 0.0 12 7 0.7 0.2| da2 >> 0 11 0 0 0.0 11 7 0.7 0.2| da3 >> 0 10 0 0 0.0 10 5 0.6 0.2| da4 >> 0 11 0 0 0.0 11 6 0.9 0.3| da5 >> 0 12 0 0 0.0 12 7 0.7 0.2| da6 >> 0 11 0 0 0.0 11 7 0.7 0.2| da7 >> 0 9 0 0 0.0 9 6 0.8 0.2| da8 >> da1 seems to be busy most of time and every few seconds all the other >> devices write some data with nearly normal speed: >> dT: 1.003s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 254 0 0 0.0 254 31331 34.9 35.4| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 4 0 0 0 0.0 0 0 0.0 0.0| da1 >> 0 254 0 0 0.0 254 31346 107.4 104.5| da2 >> 0 256 0 0 0.0 256 31345 108.1 104.0| da3 >> 0 255 0 0 0.0 255 31345 110.2 105.1| da4 >> 35 200 0 0 0.0 200 24912 143.3 115.0| da5 >> 35 211 0 0 0.0 211 26303 137.8 114.9| da6 >> 35 210 0 0 0.0 210 26079 139.3 114.9| da7 >> 35 209 0 0 0.0 209 25952 135.2 113.7| da8 >> Sometimes it even gets back to 'normal' behaviour, but never reaches >> the speeds it once had: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 35 274 0 0 0.0 274 34334 44.2 66.6| ad4 >> 0 1166 1166 149243 0.1 0 0 0.0 14.3| da0 >> 35 120 0 0 0.0 120 14717 94.4 64.5| da1 >> 35 96 0 0 0.0 96 11665 113.9 64.3| da2 >> 35 100 0 0 0.0 100 12288 98.7 63.9| da3 >> 35 103 0 0 0.0 103 12496 93.4 59.4| da4 >> 34 112 0 0 0.0 112 13694 106.1 67.4| da5 >> 35 71 0 0 0.0 71 8596 115.3 66.8| da6 >> 35 116 0 0 0.0 116 14205 101.7 67.3| da7 >> 35 83 0 0 0.0 83 10066 112.2 65.9| da8 >> Syslog reports the following: >> Oct 8 09:53:40 radium kernel: hptrr: start channel [0,0] >> Oct 8 09:53:40 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 09:57:44 radium kernel: hptrr: start channel [0,0] >> Oct 8 09:57:45 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 10:54:26 radium kernel: hptrr: start channel [0,0] >> Oct 8 10:54:27 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 11:10:29 radium kernel: hptrr: start channel [0,0] >> Oct 8 11:10:30 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 11:17:27 radium kernel: hptrr: start channel [0,0] >> Oct 8 11:17:27 radium kernel: hptrr: channel [0,0] started successfully >> Is this a problem of the hptrr device or is da1 failing? >> Mit freundlichen Gr=FC=DFen >> Best regards, >> Solon Lutz >> +-----------------------------------------------+ >> | Pyro.Labs Berlin - Creativity for tomorrow | >> | Wasgenstrasse 75/13 - 14129 Berlin, Germany | >> | www.pyro.de - phone + 49 - 30 - 48 48 58 58 | >> | info@pyro.de - fax + 49 - 30 - 80 94 03 52 | >> +-----------------------------------------------+ >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"