Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 May 2011 21:48:48 -0400
From:      Jason Hellenthal <jhell@DataIX.net>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: How to enable cache and logs.
Message-ID:  <20110512014848.GA35736@DataIX.net>
In-Reply-To: <20110512010433.GA48863@icarus.home.lan>
References:  <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

--17pEHd4RhPHOinZp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Jeremy, As always the qaulity of your messages are 101% spot on and I=20
always find some new new information that becomes handy more often than I=
=20
could say, and there is always something to be learned.=20

Thanks.

On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> >=20
> > Jeremy,
> >=20
> > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so fo=
lks
> > > > >should also keep that in mind when putting an SSD into use in this
> > > > >fashion.
> > > >
> > > > By the way, what would be the use of TRIM for SLOG and L2ARC device=
s?
> > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > written slowly (on purpose).  Any current, or 1-2 generations back =
SSD
> > > > would handle that write load without TRIM and without any performan=
ce
> > > > degradation.
> > > >
> > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > and the need for rewriting will be small. If you don't need to
> > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > twice or more the advertised size and always write to fresh cells,
> > > > scheduling an background erase of the 'overwritten' cell.
> > >=20
> > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > space they keep available on an SSD.  I'd rather not speculate as to =
how
> > > much, as I'm certain it varies per vendor.
> > >=20
> >=20
> > Lets not forget here: The size of the separate log device may be quite=
=20
> > small. A rule of thumb is that you should size the separate log to be a=
ble=20
> > to handle 10 seconds of your expected synchronous write workload. It wo=
uld=20
> > be rare to need more than 100 MB in a separate log device, but the=20
> > separate log must be at least 64 MB.
> >=20
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> >=20
> > So in other words how much is TRIM really even effective give the above=
 ?
> >=20
> > Even with a high database write load on the disks at full compacity of =
the=20
> > incoming link I would find it hard to believe that anyone could get the=
=20
> > ZIL to even come close to 512MB.
>=20
> In the case of an SSD being used as a log device (ZIL), I imagine it
> would only matter the longer the drive was kept in use.  I do not use
> log devices anywhere with ZFS, so I can't really comment.
>=20
> In the case of an SSD being used as a cache device (L2ARC), I imagine it
> would matter much more.
>=20
> In the case of an SSD being used as a pool device, it matters greatly.
>=20
> Why it matters: there's two methods of "reclaiming" blocks which were
> used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> reclaimed, it has to be erased -- SSDs erase things in pages rather
> than individual LBAs.  With TRIM, you submit the data management command
> via ATA with a list of LBAs you wish to inform the drive are no longer
> used.  The drive aggregates the LBA ranges, determines if an entire
> flash page can be erased, and does it.  If it can't, it makes some sort
> of mental note that the individual LBA (in some particular page)
> shouldn't be used.
>=20
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)
>=20
> Anyway, how all this manifests itself performance-wise is intriguing.
> It's not speculation: there's hard evidence that not using TRIM results
> in SSD performance, bluntly put, sucking badly on some SSDs.
>=20
> There's this mentality that wear levelling completely solves all of the
> **performance** concerns -- that isn't the case at all.  In fact, I'm
> under the impression it probably hurts performance, but it depends on
> how it's implemented within the drive firmware.
>=20
> bit-tech did an experiment using Windows 7 -- which supports and uses
> TRIM assuming the device advertises the capability -- with different
> models of SSDs.  The testing procedure is documented here, but I'll
> document it as well:
>=20
> http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-perform=
ance-and-trim/4
>=20
> Again, remember, this is done on a Windows 7 system which does support
> TRIM if the device supports it.  The testing steps, in this order:
>=20
> 1) SSD without TRIM support -- all LBAs are zeroed.
> 2) Took read/write benchmark readings.
> 3) SSD without TRIM support -- partitioned and formatted as NTFS
>    (cluster size unknown), copied 100GB of data to the drive, deleted all
>    the data, and repeated this method 10 times.
> 4) Step #2 repeated.
> 5) Upgraded SSD firmware to a version that supports TRIM.
> 6) SSD with TRIM support -- step #1 repeated.
> 7) Step #2 repeated.
> 8) SSD with TRIM support -- step #3 repeated.
> 9) Step #2 repeated.
>=20
> Without TRIM, some drives drop their read performance by more than 50%,
> and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> by the way.  I do not care for OCZ or Corsair products.
>=20
> So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> on FreeBSD will mimic (to some degree).
>=20
> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.
>=20
> Can you refute any of this evidence?
>=20

At least now at the moment NO. But I can say depending on how large of a=20
use of SSDs with OpenSolaris users from before the Oracle reaping that I=20
didnt recall seeing any relative bug reports on degradation. But like I=20
said... I havent seen them but thats not to say there wasnt a lack of use=
=20
either. Definately more to look into, test, benchmark & test again.

> > Given most SSD's come at a size greater than 32GB I hope this comes as =
a=20
> > early reminder that the ZIL you are buying that disk for is only going =
to=20
> > be using a small percent of that disk and I hope you justify cost over =
its=20
> > actual use. If you do happen to justify creating a ZIL for your pool th=
en=20
> > I hope that you partition it wisely to make use of the rest of the spac=
e=20
> > that is untouched.
> >=20
> > For all other cases I would reccomend if you still want to have a ZIL t=
hat=20
> > you take some sort of PCI->SD CARD or USB stick into account with=20
> > mirroring.
>=20
> Others have pointed out this isn't effective (re: USB sticks).  The read
> and write speeds are too slow, and limit the overall performance of ZFS
> in a very bad way.  I can absolutely confirm this claim (I've tested it
> myself, using a high-end USB flash drive as a cache device (L2ARC)).
>=20
> Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> *does* improve performance on older systems which have slower disk I/O
> (e.g. ICH5-based systems).
>=20

Agreed. Soon as the bus speed, write speeds are greater than the speeds=20
that USB 2.0 can handle, then any USB based solution is useless. ICH5 and=
=20
up would be right about that time you would see this starting to happen.

sdcards/cfcards mileage may vary depending on the transfer rates. But=20
still the same situation applies like you said once your main pool=20
throughput outweighs the throughput on your ZIL then its probably not=20
worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
ZIL.


Anyway all good information for those to make the judgement whether they=20
need a cache or a zil.


Thanks again Jeremy. Always appreciated.

--=20

 Regards, (jhell)
 Jason Hellenthal


--17pEHd4RhPHOinZp
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)
Comment: http://bit.ly/0x89D8547E

iQEcBAEBAgAGBQJNyzyAAAoJEJBXh4mJ2FR+2qAH/3A09ZwqGiIjuz25r5FVqwk6
iJuTHR1rlOTV0IqaUh6a2FaFGnWKDu/KpQLOj+ZGDPB6DH70fOon90QvU3/hTjoN
RhguCVxHfbQLJbqaXKHZkj+JC6RhMV1H899/VAx29XlVMfvarUXw47vF7Pjcq3G1
tK5pZyK66yldkUzPwQHufIHtcebWu7EVzGWF4Hl25apkRDTyDRHL45rIM/vdDE94
SB81i9bFD+BuMV2KKUwwG/JfKborFxtYID4Vy8nIVDGjq9fE9zh4FTClnyj3wmNE
Y5UYsjB1JkZX2q195FkMk3YxLIyS3xSTahHUqwsVZA+bm+Rc2G9DTU1jhlHSudU=
=QtJ+
-----END PGP SIGNATURE-----

--17pEHd4RhPHOinZp--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512014848.GA35736>