From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 00:42:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E91D106566B for ; Wed, 31 Aug 2011 00:42:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id 4E57D8FC0A for ; Wed, 31 Aug 2011 00:42:56 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta07.westchester.pa.mail.comcast.net with comcast id Soiw1h0031YDfWL57oiwTt; Wed, 31 Aug 2011 00:42:56 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id Sois1h0111t3BNj3goitbq; Wed, 31 Aug 2011 00:42:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4D12C102C36; Tue, 30 Aug 2011 17:42:51 -0700 (PDT) Date: Tue, 30 Aug 2011 17:42:51 -0700 From: Jeremy Chadwick To: Lev Serebryakov Message-ID: <20110831004251.GA89979@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <317753422.20110830231815@serebryakov.spb.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 00:42:56 -0000 On Tue, Aug 30, 2011 at 11:18:15PM +0400, Lev Serebryakov wrote: > Now, when I "defragmented" my large FS, I see very inconsistent > read speeds on same files. Is it Ok? > > My setup is: > > (1) FreeBSD 8.2-STABLE/x64 > (2) E4400 CPU, 2GiB RAM > (3) 5xHDDs in RAID5 (software), controller is ICH9R. > (4) UFS2 with 32KiB block, vfs.read_max=32 (1MiB read-ahead). > (5) System and swap on another (6th) HDD, but swap is unused. > (6) No periodic or background processes access FS in question at all. > > Simple program reads each of 12 files (460MiB each) 15 times in cycle > like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed, > as reading process returns to same data every ~5.5GiB and here are > only 2GiB physical memory in system. > > And speed of these reads are VERY inconsistent. I've calculated > min/average/max and standard deviation and results are like this: > > Name Min/Avg/Max StdDev > r012f02.nef 120/235/413 MiB/s 83 > r012f09.nef 154/248/393 MiB/s 80 > r012f12.nef 106/212/293 MiB/s 63 > r012f05.nef 86/206/280 MiB/s 62 > r012f08.nef 128/223/332 MiB/s 60 > r012f11.nef 155/257/327 MiB/s 56 > r012f03.nef 121/213/279 MiB/s 52 > r012f10.nef 120/226/284 MiB/s 45 > r012f07.nef 121/199/249 MiB/s 41 > r012f01.nef 135/199/242 MiB/s 33 > > It is results from 15 runs! One time file was read at sustained > average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s > (only ~1.1 second!) > > And it is not case when first read is slowest. No. Sometimes last > one is slowest, for example. > > Is it Ok? I'm very disappointed to see 120MiB/s when I know that > hardware can give 415MiB/s, but something strange slows down the > process. What appears to have been missed here is that there are 5 drives in a RAID-5 fashion. Wait, RAID-5? FreeBSD has RAID-5 support? How? Oh, right... There's a port called sysutils/graid5 which is a "converted to work on FreeBSD 8.x" GEOM class for RAID-5. The original was written for earlier FreeBSD and was called geom_raid5. The original that Arne Worner introduced was written in 2006. A port was made for it only recently: http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/graid5/Makefile What scares me is the number of "variants" on this code: http://en.wikipedia.org/wiki/Geom_raid5 Some users have asked why this code hasn't ever been committed to the FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"): http://forums.freebsd.org/showthread.php?t=9040 There are admissions from Arne that "the code is absolutely horrible", which may be why it's never been committed to FreeBSD. There's also all sorts of other concerns: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00437.html Here's one citing concerns over "aggressive caching", talking about writes and not reads, but my point still applies: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.html The thread continues for quite some time. There's also a freebsd-current thread from 2007 asking if the code could be committed to HEAD, with some users stating they'd like to see that too -- with one noting that gvinum has support for RAID-5 so basically "which is better?" (I imagine that question is still unanswered) There were also concerns over testing, reliability, throughput, etc. and the answers (as of 2007) were really not that great: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00351.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00361.html So can I ask what guarantee you have that geom_raid5 is not responsible for the intermittent I/O speeds you see? I would recommend you remove geom_raid5 from the picture entirely and replace it with either gstripe(8) or ccd(4) SOLELY FOR TESTING. Furthermore, why are these benchmarks not providing speed data per-device (e.g. gstat or iostat -x data)? There is a possibility that one of your drives could be performing at less-than-ideal rates (yes, intermittently) and therefore impacts (intermittently) your overall I/O throughput. The other posts in this mail thread so far are much more conclusive, but the above points/concerns I believe are still valid. They have never been thoroughly refuted or addressed. I guess you could say I'm very surprised someone is complaining about performance issues on FreeBSD when using a 3rd-party GEOM class that's been scrutinised in the past. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |