Date: Wed, 11 May 2011 19:08:04 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Jason Hellenthal <jhell@DataIX.net> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. Message-ID: <20110512020804.GA50560@icarus.home.lan> In-Reply-To: <20110512014848.GA35736@DataIX.net> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> <20110512014848.GA35736@DataIX.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote: > Jeremy, As always the qaulity of your messages are 101% spot on and I > always find some new new information that becomes handy more often than I > could say, and there is always something to be learned. > > Thanks. > > On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > > > > > > Jeremy, > > > > > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > > > >should also keep that in mind when putting an SSD into use in this > > > > > >fashion. > > > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > > > would handle that write load without TRIM and without any performance > > > > > degradation. > > > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > > and the need for rewriting will be small. If you don't need to > > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > > twice or more the advertised size and always write to fresh cells, > > > > > scheduling an background erase of the 'overwritten' cell. > > > > > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > > space they keep available on an SSD. I'd rather not speculate as to how > > > > much, as I'm certain it varies per vendor. > > > > > > > > > > Lets not forget here: The size of the separate log device may be quite > > > small. A rule of thumb is that you should size the separate log to be able > > > to handle 10 seconds of your expected synchronous write workload. It would > > > be rare to need more than 100 MB in a separate log device, but the > > > separate log must be at least 64 MB. > > > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > > > So in other words how much is TRIM really even effective give the above ? > > > > > > Even with a high database write load on the disks at full compacity of the > > > incoming link I would find it hard to believe that anyone could get the > > > ZIL to even come close to 512MB. > > > > In the case of an SSD being used as a log device (ZIL), I imagine it > > would only matter the longer the drive was kept in use. I do not use > > log devices anywhere with ZFS, so I can't really comment. > > > > In the case of an SSD being used as a cache device (L2ARC), I imagine it > > would matter much more. > > > > In the case of an SSD being used as a pool device, it matters greatly. > > > > Why it matters: there's two methods of "reclaiming" blocks which were > > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > > reclaimed, it has to be erased -- SSDs erase things in pages rather > > than individual LBAs. With TRIM, you submit the data management command > > via ATA with a list of LBAs you wish to inform the drive are no longer > > used. The drive aggregates the LBA ranges, determines if an entire > > flash page can be erased, and does it. If it can't, it makes some sort > > of mental note that the individual LBA (in some particular page) > > shouldn't be used. > > > > The "garbage collection" works when the SSD is idle. I have no idea > > what "idle" actually means operationally, because again, vendors don't > > disclose what the idle intervals are. 5 minutes? 24 hours? It > > matters, but they don't tell us. (What confuses me about the "idle GC" > > method is how it determines what it can erase -- if the OS didn't tell > > it what it's using, how does it know it can erase the page?) > > > > Anyway, how all this manifests itself performance-wise is intriguing. > > It's not speculation: there's hard evidence that not using TRIM results > > in SSD performance, bluntly put, sucking badly on some SSDs. > > > > There's this mentality that wear levelling completely solves all of the > > **performance** concerns -- that isn't the case at all. In fact, I'm > > under the impression it probably hurts performance, but it depends on > > how it's implemented within the drive firmware. > > > > bit-tech did an experiment using Windows 7 -- which supports and uses > > TRIM assuming the device advertises the capability -- with different > > models of SSDs. The testing procedure is documented here, but I'll > > document it as well: > > > > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 > > > > Again, remember, this is done on a Windows 7 system which does support > > TRIM if the device supports it. The testing steps, in this order: > > > > 1) SSD without TRIM support -- all LBAs are zeroed. > > 2) Took read/write benchmark readings. > > 3) SSD without TRIM support -- partitioned and formatted as NTFS > > (cluster size unknown), copied 100GB of data to the drive, deleted all > > the data, and repeated this method 10 times. > > 4) Step #2 repeated. > > 5) Upgraded SSD firmware to a version that supports TRIM. > > 6) SSD with TRIM support -- step #1 repeated. > > 7) Step #2 repeated. > > 8) SSD with TRIM support -- step #3 repeated. > > 9) Step #2 repeated. > > > > Without TRIM, some drives drop their read performance by more than 50%, > > and write performance by almost 70%. I'm focusing on Intel SSDs here, > > by the way. I do not care for OCZ or Corsair products. > > > > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > > on FreeBSD will mimic (to some degree). > > > > Therefore, simply put, users should be concerned when using ZFS on > > FreeBSD with SSDs. It doesn't matter to me if you're only using > > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > > means degraded performance over time. > > > > Can you refute any of this evidence? > > > > At least now at the moment NO. But I can say depending on how large of a > use of SSDs with OpenSolaris users from before the Oracle reaping that I > didnt recall seeing any relative bug reports on degradation. But like I > said... I havent seen them but thats not to say there wasnt a lack of use > either. Definately more to look into, test, benchmark & test again. > > > > Given most SSD's come at a size greater than 32GB I hope this comes as a > > > early reminder that the ZIL you are buying that disk for is only going to > > > be using a small percent of that disk and I hope you justify cost over its > > > actual use. If you do happen to justify creating a ZIL for your pool then > > > I hope that you partition it wisely to make use of the rest of the space > > > that is untouched. > > > > > > For all other cases I would reccomend if you still want to have a ZIL that > > > you take some sort of PCI->SD CARD or USB stick into account with > > > mirroring. > > > > Others have pointed out this isn't effective (re: USB sticks). The read > > and write speeds are too slow, and limit the overall performance of ZFS > > in a very bad way. I can absolutely confirm this claim (I've tested it > > myself, using a high-end USB flash drive as a cache device (L2ARC)). > > > > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > > *does* improve performance on older systems which have slower disk I/O > > (e.g. ICH5-based systems). > > > > Agreed. Soon as the bus speed, write speeds are greater than the speeds > that USB 2.0 can handle, then any USB based solution is useless. ICH5 and > up would be right about that time you would see this starting to happen. > > sdcards/cfcards mileage may vary depending on the transfer rates. But > still the same situation applies like you said once your main pool > throughput outweighs the throughput on your ZIL then its probably not > worth even having a ZIL or a Cache device. Emphasis on Cache moreso than > ZIL. > > > Anyway all good information for those to make the judgement whether they > need a cache or a zil. > > > Thanks again Jeremy. Always appreciated. You're welcome. It's important to note that much of what I say is stuff I've learned and read (technical documentation usually) on my own -- which means I almost certainly misunderstand certain pieces of technology. There are a *lot* of people here who understand it much better than I do. (I'm looking at you, jhb@ ;-) ) As such, I probably should have CC'd pjd@ on this thread, since he's talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when to issue the erasing of said blocks. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512020804.GA50560>