Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 May 2011 19:08:04 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Jason Hellenthal <jhell@DataIX.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS: How to enable cache and logs.
Message-ID:  <20110512020804.GA50560@icarus.home.lan>
In-Reply-To: <20110512014848.GA35736@DataIX.net>
References:  <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> <20110512014848.GA35736@DataIX.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote:
> Jeremy, As always the qaulity of your messages are 101% spot on and I 
> always find some new new information that becomes handy more often than I 
> could say, and there is always something to be learned. 
>
> Thanks.
>
> On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > > 
> > > Jeremy,
> > > 
> > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > > >should also keep that in mind when putting an SSD into use in this
> > > > > >fashion.
> > > > >
> > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > > > would handle that write load without TRIM and without any performance
> > > > > degradation.
> > > > >
> > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > > and the need for rewriting will be small. If you don't need to
> > > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > > twice or more the advertised size and always write to fresh cells,
> > > > > scheduling an background erase of the 'overwritten' cell.
> > > > 
> > > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > > space they keep available on an SSD.  I'd rather not speculate as to how
> > > > much, as I'm certain it varies per vendor.
> > > > 
> > > 
> > > Lets not forget here: The size of the separate log device may be quite 
> > > small. A rule of thumb is that you should size the separate log to be able 
> > > to handle 10 seconds of your expected synchronous write workload. It would 
> > > be rare to need more than 100 MB in a separate log device, but the 
> > > separate log must be at least 64 MB.
> > > 
> > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > > 
> > > So in other words how much is TRIM really even effective give the above ?
> > > 
> > > Even with a high database write load on the disks at full compacity of the 
> > > incoming link I would find it hard to believe that anyone could get the 
> > > ZIL to even come close to 512MB.
> > 
> > In the case of an SSD being used as a log device (ZIL), I imagine it
> > would only matter the longer the drive was kept in use.  I do not use
> > log devices anywhere with ZFS, so I can't really comment.
> > 
> > In the case of an SSD being used as a cache device (L2ARC), I imagine it
> > would matter much more.
> > 
> > In the case of an SSD being used as a pool device, it matters greatly.
> > 
> > Why it matters: there's two methods of "reclaiming" blocks which were
> > used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> > reclaimed, it has to be erased -- SSDs erase things in pages rather
> > than individual LBAs.  With TRIM, you submit the data management command
> > via ATA with a list of LBAs you wish to inform the drive are no longer
> > used.  The drive aggregates the LBA ranges, determines if an entire
> > flash page can be erased, and does it.  If it can't, it makes some sort
> > of mental note that the individual LBA (in some particular page)
> > shouldn't be used.
> > 
> > The "garbage collection" works when the SSD is idle.  I have no idea
> > what "idle" actually means operationally, because again, vendors don't
> > disclose what the idle intervals are.  5 minutes?  24 hours?  It
> > matters, but they don't tell us.  (What confuses me about the "idle GC"
> > method is how it determines what it can erase -- if the OS didn't tell
> > it what it's using, how does it know it can erase the page?)
> > 
> > Anyway, how all this manifests itself performance-wise is intriguing.
> > It's not speculation: there's hard evidence that not using TRIM results
> > in SSD performance, bluntly put, sucking badly on some SSDs.
> > 
> > There's this mentality that wear levelling completely solves all of the
> > **performance** concerns -- that isn't the case at all.  In fact, I'm
> > under the impression it probably hurts performance, but it depends on
> > how it's implemented within the drive firmware.
> > 
> > bit-tech did an experiment using Windows 7 -- which supports and uses
> > TRIM assuming the device advertises the capability -- with different
> > models of SSDs.  The testing procedure is documented here, but I'll
> > document it as well:
> > 
> > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
> > 
> > Again, remember, this is done on a Windows 7 system which does support
> > TRIM if the device supports it.  The testing steps, in this order:
> > 
> > 1) SSD without TRIM support -- all LBAs are zeroed.
> > 2) Took read/write benchmark readings.
> > 3) SSD without TRIM support -- partitioned and formatted as NTFS
> >    (cluster size unknown), copied 100GB of data to the drive, deleted all
> >    the data, and repeated this method 10 times.
> > 4) Step #2 repeated.
> > 5) Upgraded SSD firmware to a version that supports TRIM.
> > 6) SSD with TRIM support -- step #1 repeated.
> > 7) Step #2 repeated.
> > 8) SSD with TRIM support -- step #3 repeated.
> > 9) Step #2 repeated.
> > 
> > Without TRIM, some drives drop their read performance by more than 50%,
> > and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> > by the way.  I do not care for OCZ or Corsair products.
> > 
> > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> > on FreeBSD will mimic (to some degree).
> > 
> > Therefore, simply put, users should be concerned when using ZFS on
> > FreeBSD with SSDs.  It doesn't matter to me if you're only using
> > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> > means degraded performance over time.
> > 
> > Can you refute any of this evidence?
> > 
> 
> At least now at the moment NO. But I can say depending on how large of a 
> use of SSDs with OpenSolaris users from before the Oracle reaping that I 
> didnt recall seeing any relative bug reports on degradation. But like I 
> said... I havent seen them but thats not to say there wasnt a lack of use 
> either. Definately more to look into, test, benchmark & test again.
> 
> > > Given most SSD's come at a size greater than 32GB I hope this comes as a 
> > > early reminder that the ZIL you are buying that disk for is only going to 
> > > be using a small percent of that disk and I hope you justify cost over its 
> > > actual use. If you do happen to justify creating a ZIL for your pool then 
> > > I hope that you partition it wisely to make use of the rest of the space 
> > > that is untouched.
> > > 
> > > For all other cases I would reccomend if you still want to have a ZIL that 
> > > you take some sort of PCI->SD CARD or USB stick into account with 
> > > mirroring.
> > 
> > Others have pointed out this isn't effective (re: USB sticks).  The read
> > and write speeds are too slow, and limit the overall performance of ZFS
> > in a very bad way.  I can absolutely confirm this claim (I've tested it
> > myself, using a high-end USB flash drive as a cache device (L2ARC)).
> > 
> > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> > *does* improve performance on older systems which have slower disk I/O
> > (e.g. ICH5-based systems).
> > 
> 
> Agreed. Soon as the bus speed, write speeds are greater than the speeds 
> that USB 2.0 can handle, then any USB based solution is useless. ICH5 and 
> up would be right about that time you would see this starting to happen.
> 
> sdcards/cfcards mileage may vary depending on the transfer rates. But 
> still the same situation applies like you said once your main pool 
> throughput outweighs the throughput on your ZIL then its probably not 
> worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
> ZIL.
> 
> 
> Anyway all good information for those to make the judgement whether they 
> need a cache or a zil.
> 
> 
> Thanks again Jeremy. Always appreciated.

You're welcome.

It's important to note that much of what I say is stuff I've learned and
read (technical documentation usually) on my own -- which means I almost
certainly misunderstand certain pieces of technology.  There are a *lot*
of people here who understand it much better than I do.  (I'm looking at
you, jhb@  ;-) )

As such, I probably should have CC'd pjd@ on this thread, since he's
talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when
to issue the erasing of said blocks.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512020804.GA50560>