Date: Tue, 14 May 2013 09:15:32 -0400 From: Paul Kraus <paul@kraus-haus.org> To: Shane Ambler <FreeBSD@ShaneWare.Biz> Cc: "freebsd-questions@freebsd.org List" <freebsd-questions@freebsd.org>, =?windows-1252?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no> Subject: Re: ZFS mirror install /mnt is empty Message-ID: <D15CEE39-5B84-4FBD-980E-AF81EB45C8DA@kraus-haus.org> In-Reply-To: <5191B950.4040402@ShaneWare.Biz> References: <5190058D.2030705@micite.net> <alpine.BSF.2.00.1305130743320.72982@mail.fig.ol.no> <472E17AF-B249-4FD3-8F5E-716F8B7867F5@kraus-haus.org> <alpine.BSF.2.00.1305131522340.72982@mail.fig.ol.no> <5191B950.4040402@ShaneWare.Biz>
next in thread | previous in thread | raw e-mail | index | archive | help
On May 14, 2013, at 12:10 AM, Shane Ambler <FreeBSD@ShaneWare.Biz> = wrote: > When it comes to disk compression I think people overlook the fact = that > it can impact on more than one level. Compression has effects at multiple levels: 1) CPU resources to compress (and decompress) the data 2) Disk space used 3) I/O to/from disks > The size of disks these days means that compression doesn't make a big > difference to storage capacity for most people and 4k blocks mean = little > change in final disk space used. The 4K block issue is *huge* if the majority of your data is = less than 4K files. It is also large when you consider that a 5K file = will not occupy 8K on disk. I am not a UFS on FreeBSD expert, but UFS on = Solaris uses a default block size of 4K but has a fragment size of 1K. = So files are stored on disk with 1K resolution (so to speak). By going = to a 4K minimum block size you are forcing all data up to the next 4K = boundary. Now, if the majority of your data is in large files (1MB or = more), then the 4K minimum black size probably gets lost in the noise. The other factor is the actual compressibility of the data. Most = media files (JPEG, MPEG, GIF, PNG, MP3, AAC, etc.) are already = compressed and trying to compress them again is not likely to garner any = real reduction inn size. In my experience with the default compression = algorithm (lzjb), even uncompressed audio files (.AIFF or .WAV) do not = compress enough to make the CPU overhead worthwhile. > One thing people seem to miss is the fact that compressed files are > going to reduce the amount of data sent through the bottle neck that = is > the wire between motherboard and drive. While a 3k file compressed to = 1k > still uses a 4k block on disk it does (should) reduce the true data > transferred to disk. Given a 9.1 source tree using 865M, if it > compresses to 400M then it is going to reduce the time to read the > entire tree during compilation. This would impact a 32 thread build = more > than a 4 thread build. If the data does not compress well, then you get hit with the = CPU overhead of compression to no bandwidth or space benefit. How = compressible is the source tree ? [Not a loaded question, I haven't = tried to compress it] > While it is said that compression adds little overhead, time wise, Compression most certainly DOES add overhead in terms of time, = based on the speed of your CPU and how busy your system is. My home = server is an HP Proliant Micro with a dual core AMD N36 running at 1.3 = GHz. Turning on compression hurts performance *if* I am getting less = than 1.2:1 compression ratio (5 drive RAIDz2 of 1TB Enterprise disks). = Above that the I/O bandwidth reduction due to the compression makes up = for the lost CPU cycles. I have managed servers where each case = prevailed=85 CPU limited so compression hurt performance and I/O limited = where compression helped performance. > it is > going to take time to compress the data which is going to increase > latency. Going from a 6ms platter disk latency to a 0.2ms SSD latency > gives a noticeable improvement to responsiveness. Adding compression = is > going to bring that back up - possibly higher than 6ms. Interesting point. I am not sure of the data flow through the = code to know if compression has a defined latency component, or is just = throughput limited by CPU cycles to do the compression. > Together these two factors may level out the total time to read a = file. >=20 > One question there is whether the zfs cache uses compressed file data > therefore keeping the latency while eliminating the bandwidth. Data cached in the ZFS ARC or L2ARC is uncompressed. Data sent = via zfs send / zfs receive is uncompressed; there had been talk of an = option to send / receive compressed data, but I do not think it has gone = anywhere. > Personally I have compression turned off (desktop). My thought is that > the latency added for compression would negate the bandwidth savings. >=20 > For a file server I would consider turning it on as network overhead = is > going to hide the latency. Once again, it all depends on the compressibility of the data, = the available CPU resources, the speed of the CPU resources, and the I/O = bandwidth to/from the drives. Note also that RAIDz (RAIDz2, RAIDz3) have their own = computational overhead, so compression may be a bigger advantage in this = case than in the case of a mirror, as the RAID code will have less data = to process after being compressed. -- Paul Kraus Deputy Technical Director, LoneStarCon 3 Sound Coordinator, Schenectady Light Opera Company
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D15CEE39-5B84-4FBD-980E-AF81EB45C8DA>