Date: Wed, 31 Aug 2011 12:03:00 +0400 From: Lev Serebryakov <lev@serebryakov.spb.ru> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs@freebsd.org, Lev Serebryakov <lev@FreeBSD.org> Subject: Re: Very inconsistent (read) speed on UFS2 Message-ID: <687356195.20110831120300@serebryakov.spb.ru> In-Reply-To: <20110831004251.GA89979@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Jeremy. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 4:42:51: > What appears to have been missed here is that there are 5 drives in a > RAID-5 fashion. Wait, RAID-5? FreeBSD has RAID-5 support? How? Oh, > right... > There's a port called sysutils/graid5 which is a "converted to work on > FreeBSD 8.x" GEOM class for RAID-5. The original was written for > earlier FreeBSD and was called geom_raid5. The original that Arne > Worner introduced was written in 2006. A port was made for it only > recently: I'm author of this port. And I'm author of some improvements, approved by Arne Worner, which is included into this port :) And it seems, that I'm only user in whole world of this port, too. But it works for me for many years without any data-loss problems. It helps me not to lost data, when I had 3 dead HDDs in these years (not in simultaneously, of course) and upgrade my server from 5x500Gb to 5x2Tb configuration without stopping it (ok, with small stop for "growfs" run, but all HDDs were replaced one-by-one on live system, thaks to STAT hotplug). Now I'm trying to squeeze maximum speed from this software :) > What scares me is the number of "variants" on this code: > http://en.wikipedia.org/wiki/Geom_raid5 There are three wariants dumb proof-of-consept, stable and fast, but not ideal code and experimental one. Port uses second one. First one is way to slow and third one HAVE problems. What scares _me_ is the coding style of Arne. I've spent almost year to understand almost all details of this code, mostly due to two-letter variables, etc. > Some users have asked why this code hasn't ever been committed to the > FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"): > http://forums.freebsd.org/showthread.php?t=3D9040 Code style. And I mean real problems, not some nit-picking about "return 0;" vs "return (0);" or white spaces. I'm trying to clean up it in separate branch, without changing functionality, before I'll implement some new ideas, which should cleannup code even more. But it is not very fast process, as I don't have a lot of spare time now, and it is work which takes A LOT of concentration. > Here's one citing concerns over "aggressive caching", talking about > writes and not reads, but my point still applies: > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.= html > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.= html Yep, and this aggressive caching could be turned off. But it is GREAT help on write speed. Use good UPS and nut -- they really HELP. And, other note: without UPS and nut even without geom_raid5 here is BIG problem with large volumes and UFS2. Background ffsck for 2Tb volume takes about three hours, when system almost locked, and fails often. fsck of 8Tb volume? It is my worst nightmare. And it doesn't depend on RADI5 and it write cache. Use UPS. USE IT. > So can I ask what guarantee you have that geom_raid5 is not responsible > for the intermittent I/O speeds you see? I would recommend you remove I'm not sure here -- it is the point. I want to understand, is it geom_raid5 problem, UFS2 problem, VMM problem or some combination of ``glithces'' of these subsystems. I'm almost sure, it is not problem of something ``in vacuum,'' it is problem at border between subsystems. And, as I don't understand well how to "look inside" UFS2, I ask for help here. > geom_raid5 from the picture entirely and replace it with either > gstripe(8) or ccd(4) SOLELY FOR TESTING. It is impossible in this config: I have data which is valuable for me. Here is problem: I could do any tests, but speed one, on test server and VMs. I could run testsuite, switch off HDDs, re-create FSes, etc., to be sure that geom_raid5 is STABLE in terms of data safety. But only BIG system, on which I could perform valid speed benchmarks, is my home server with my data, which I could not lost. It is useless to run such benchmarks on array of old 9GiB (yes, you read= it right, 9 gigabytes) SCSI HDDs or in virtual machine with bunch of virtual HDDs. And I have not second server with modern fast and big disks. Sorry. > Furthermore, why are these benchmarks not providing speed data > per-device (e.g. gstat or iostat -x data)? There is a possibility that > one of your drives could be performing at less-than-ideal rates (yes, > intermittently) and therefore impacts (intermittently) your overall I/O > throughput. I'll look at this, but I've zeor-outed all HDDs before placing them into array, and speed were identical. > been thoroughly refuted or addressed. I guess you could say I'm very > surprised someone is complaining about performance issues on FreeBSD > when using a 3rd-party GEOM class that's been scrutinised in the past. It is not complain. It is request for help in profiling very old and complex subsystem :) Maybe, I was not very clear here in my first message. --=20 // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?687356195.20110831120300>
