Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 May 2013 09:15:32 -0400
From:      Paul Kraus <paul@kraus-haus.org>
To:        Shane Ambler <FreeBSD@ShaneWare.Biz>
Cc:        "freebsd-questions@freebsd.org List" <freebsd-questions@freebsd.org>, =?windows-1252?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
Subject:   Re: ZFS mirror install /mnt is empty
Message-ID:  <D15CEE39-5B84-4FBD-980E-AF81EB45C8DA@kraus-haus.org>
In-Reply-To: <5191B950.4040402@ShaneWare.Biz>
References:  <5190058D.2030705@micite.net> <alpine.BSF.2.00.1305130743320.72982@mail.fig.ol.no> <472E17AF-B249-4FD3-8F5E-716F8B7867F5@kraus-haus.org> <alpine.BSF.2.00.1305131522340.72982@mail.fig.ol.no> <5191B950.4040402@ShaneWare.Biz>

next in thread | previous in thread | raw e-mail | index | archive | help
On May 14, 2013, at 12:10 AM, Shane Ambler <FreeBSD@ShaneWare.Biz> =
wrote:

> When it comes to disk compression I think people overlook the fact =
that
> it can impact on more than one level.

Compression has effects at multiple levels:

1) CPU resources to compress (and decompress) the data
2) Disk space used
3) I/O to/from disks

> The size of disks these days means that compression doesn't make a big
> difference to storage capacity for most people and 4k blocks mean =
little
> change in final disk space used.

	The 4K block issue is *huge* if the majority of your data is =
less than 4K files. It is also large when you consider that a 5K file =
will not occupy 8K on disk. I am not a UFS on FreeBSD expert, but UFS on =
Solaris uses a default block size of 4K but has a fragment size of 1K. =
So files are stored on disk with 1K resolution (so to speak). By going =
to a 4K minimum block size you are forcing all data up to the next 4K =
boundary.

	Now, if the majority of your data is in large files (1MB or =
more), then the 4K minimum black size probably gets lost in the noise.

	The other factor is the actual compressibility of the data. Most =
media files (JPEG, MPEG, GIF, PNG, MP3, AAC, etc.) are already =
compressed and trying to compress them again is not likely to garner any =
real reduction inn size. In my experience with the default compression =
algorithm (lzjb), even uncompressed audio files (.AIFF or .WAV) do not =
compress enough to make the CPU overhead worthwhile.

> One thing people seem to miss is the fact that compressed files are
> going to reduce the amount of data sent through the bottle neck that =
is
> the wire between motherboard and drive. While a 3k file compressed to =
1k
> still uses a 4k block on disk it does (should) reduce the true data
> transferred to disk. Given a 9.1 source tree using 865M, if it
> compresses to 400M then it is going to reduce the time to read the
> entire tree during compilation. This would impact a 32 thread build =
more
> than a 4 thread build.

	If the data does not compress well, then you get hit with the =
CPU overhead of compression to no bandwidth or space benefit. How =
compressible is the source tree ? [Not a loaded question, I haven't =
tried to compress it]

> While it is said that compression adds little overhead, time wise,

	Compression most certainly DOES add overhead in terms of time, =
based on the speed of your CPU and how busy your system is. My home =
server is an HP Proliant Micro with a dual core AMD N36 running at 1.3 =
GHz. Turning on compression hurts performance *if* I am getting less =
than 1.2:1 compression ratio (5 drive RAIDz2 of 1TB Enterprise disks). =
Above that the I/O bandwidth reduction due to the compression makes up =
for the lost CPU cycles. I have managed servers where each case =
prevailed=85 CPU limited so compression hurt performance and I/O limited =
where compression helped performance.

> it is
> going to take time to compress the data which is going to increase
> latency. Going from a 6ms platter disk latency to a 0.2ms SSD latency
> gives a noticeable improvement to responsiveness. Adding compression =
is
> going to bring that back up - possibly higher than 6ms.

	Interesting point. I am not sure of the data flow through the =
code to know if compression has a defined latency component, or is just =
throughput limited by CPU cycles to do the compression.

> Together these two factors may level out the total time to read a =
file.
>=20
> One question there is whether the zfs cache uses compressed file data
> therefore keeping the latency while eliminating the bandwidth.

	Data cached in the ZFS ARC or L2ARC is uncompressed. Data sent =
via zfs send / zfs receive is uncompressed; there had been talk of an =
option to send / receive compressed data, but I do not think it has gone =
anywhere.

> Personally I have compression turned off (desktop). My thought is that
> the latency added for compression would negate the bandwidth savings.
>=20
> For a file server I would consider turning it on as network overhead =
is
> going to hide the latency.

	Once again, it all depends on the compressibility of the data, =
the available CPU resources, the speed of the CPU resources, and the I/O =
bandwidth to/from the drives.

	Note also that RAIDz (RAIDz2, RAIDz3) have their own =
computational overhead, so compression may be a bigger advantage in this =
case than in the case of a mirror, as the RAID code will have less data =
to process after being compressed.

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D15CEE39-5B84-4FBD-980E-AF81EB45C8DA>