Date: Thu, 12 May 2011 13:16:43 +0300 From: Daniel Kalchev <daniel@digsys.bg> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. Message-ID: <4DCBB38B.3090806@digsys.bg> In-Reply-To: <20110512083429.GA58841@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12.05.11 11:34, Jeremy Chadwick wrote: > > I guess this is why others have mentioned the importance of BBUs and > supercaps, but I don't know what guarantee there is that during a power > failure there won't be some degree of filesystem corruption or lost > data. You can think of the SLOG as the BBU of ZFS. The best SLOG of course is battery backed RAM. Just what the BBUs are. Any battery backed RAM device used for SLOG will beat (by a large margin) any however expensive SSD. Fears of corruption is, besides performance, what makes people use SLC flash for SLOG devices. The MLC flash is much more prone to errors, than SLC flash. This includes situations like power loss. This is also the reason people talk so much about super-capacitors. > >> How can ever TRIM support influence reading from the drive?! > I guess you want more proof, so here you go. Of course :) > I imagine the reason this happens is similar to why memory performance > degrades under fragmentation or when there's a lot of "middle-man stuff" > going on. TRIM does not change fragmentation. All TRIM does is erase the flash cells in background, so that when the new write request arrives, data can just be written, instead of erased-written. The erase operation is slow in flash memory. Think of TRIM as OS-assisted garbage collection. It is nothing else -- no matter what advertising says :) Also, please note that there is no "fragmentation" in either SLOG or L2ARC to be concerned with. There are no "files" there - just raw blocks that can sit anywhere. >> TRIM is an slow operation. How often are these issued? > Good questions, for which I have no answer. The same could be asked of > any OS however, not just Windows. And I've asked the same question > about SSDs internal "garbage collection" too. I have no answers, so you > and I are both wondering the same question. And yes, I am aware TRIM is > a costly operation. Well, at least we know some commodity SSDs on the market have "lazy" garbage collection, some do it right away. The "lazy" drives give good performance initially Jeremy, thanks for the detailed data. So much about theory :) Just a quick "(slow) HDD as SLOG" test, not very scientific :) Hardware: Supermicro X8DTH-6F (integrated LSI2008) 2xE5620 Xeons 24 GB RAM 6x Hitachi HDS72303 drives All disks are labeled with GPT, first partition on 1GB. First, create ashift=12 raidz2 zpool with all drives # gnop create -S 4096 gpt/disk00 # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 126 99 293971 93 177423 52 357 99 502710 86 234.2 8 Latency 68881us 2817ms 5388ms 37301us 1266ms 471ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 25801 90 +++++ +++ 23915 94 25869 98 +++++ +++ 24858 97 Latency 12098us 117us 141us 24121us 29us 66us 1.96,1.96,a1.register.bg,1,1305158675,48G,,126,99,293971,93,177423,52,357,99,502710,86,234. 2,8,16,,,,,25801,90,+++++,+++,23915,94,25869,98,+++++,+++,24858,97,68881us,2817ms,5388ms,37 301us,1266ms,471ms,12098us,117us,141us,24121us,29us,66us Recreate the pool with 5 drives + one drive as SLOG # zpool destroy storage # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 log gpt/disk05 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 110 99 306932 68 223853 46 354 99 664034 65 501.8 11 Latency 172ms 11571ms 4217ms 50414us 1895ms 245ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 24673 97 +++++ +++ 24262 98 19108 97 +++++ +++ 23821 97 Latency 12051us 132us 143us 23392us 47us 79us 1.96,1.96,a1.register.bg,1,1305171999,48G,,110,99,306932,68,223853,46,354,99,664034,65,501.8,11,16,,,,,24673,97,+++++,+++,24262,98,19108,97,+++++,+++,23821,97,172ms,11571ms,4217ms,50414us,1895ms,245ms,12051us,132us,143us,23392us,47us,79us Interesting to note that zpool iostat -v 1 never showed more than 128K of usage on the SLOG drive, although from time to time it was hitting over 1200 IOPS and over 150 MB/s write. Also, the second pool is with one disk less. For comparison, here is the same pool with 5 disks and no SLOG # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 118 99 287361 92 152566 40 345 98 398392 51 242.4 24 Latency 56962us 2619ms 4308ms 57304us 1214ms 350ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 27438 95 +++++ +++ 19374 90 25259 97 +++++ +++ 6876 99 Latency 8913us 200us 295us 27249us 30us 238us 1.96,1.96,a1.register.bg,1,1305165435,48G,,118,99,287361,92,152566,40,345,98,398392,51,242. 4,24,16,,,,,27438,95,+++++,+++,19374,90,25259,97,+++++,+++,6876,99,56962us,2619ms,4308ms,57 304us,1214ms,350ms,8913us,200us,295us,27249us,30us,238us One side effect I forgot to mention from using a SLOG is less fragmentation in the pool. When the ZIL is in the main pool, it is frequently written and erased and the ZIL is variable size, leaving undesired gaps. Hope this helps. Daniel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DCBB38B.3090806>