Date: Wed, 11 May 2011 21:48:48 -0400 From: Jason Hellenthal <jhell@DataIX.net> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. Message-ID: <20110512014848.GA35736@DataIX.net> In-Reply-To: <20110512010433.GA48863@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
--17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Jeremy, As always the qaulity of your messages are 101% spot on and I=20 always find some new new information that becomes handy more often than I= =20 could say, and there is always something to be learned.=20 Thanks. On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > >=20 > > Jeremy, > >=20 > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so fo= lks > > > > >should also keep that in mind when putting an SSD into use in this > > > > >fashion. > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC device= s? > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > written slowly (on purpose). Any current, or 1-2 generations back = SSD > > > > would handle that write load without TRIM and without any performan= ce > > > > degradation. > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > and the need for rewriting will be small. If you don't need to > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > twice or more the advertised size and always write to fresh cells, > > > > scheduling an background erase of the 'overwritten' cell. > > >=20 > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > space they keep available on an SSD. I'd rather not speculate as to = how > > > much, as I'm certain it varies per vendor. > > >=20 > >=20 > > Lets not forget here: The size of the separate log device may be quite= =20 > > small. A rule of thumb is that you should size the separate log to be a= ble=20 > > to handle 10 seconds of your expected synchronous write workload. It wo= uld=20 > > be rare to need more than 100 MB in a separate log device, but the=20 > > separate log must be at least 64 MB. > >=20 > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > >=20 > > So in other words how much is TRIM really even effective give the above= ? > >=20 > > Even with a high database write load on the disks at full compacity of = the=20 > > incoming link I would find it hard to believe that anyone could get the= =20 > > ZIL to even come close to 512MB. >=20 > In the case of an SSD being used as a log device (ZIL), I imagine it > would only matter the longer the drive was kept in use. I do not use > log devices anywhere with ZFS, so I can't really comment. >=20 > In the case of an SSD being used as a cache device (L2ARC), I imagine it > would matter much more. >=20 > In the case of an SSD being used as a pool device, it matters greatly. >=20 > Why it matters: there's two methods of "reclaiming" blocks which were > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > reclaimed, it has to be erased -- SSDs erase things in pages rather > than individual LBAs. With TRIM, you submit the data management command > via ATA with a list of LBAs you wish to inform the drive are no longer > used. The drive aggregates the LBA ranges, determines if an entire > flash page can be erased, and does it. If it can't, it makes some sort > of mental note that the individual LBA (in some particular page) > shouldn't be used. >=20 > The "garbage collection" works when the SSD is idle. I have no idea > what "idle" actually means operationally, because again, vendors don't > disclose what the idle intervals are. 5 minutes? 24 hours? It > matters, but they don't tell us. (What confuses me about the "idle GC" > method is how it determines what it can erase -- if the OS didn't tell > it what it's using, how does it know it can erase the page?) >=20 > Anyway, how all this manifests itself performance-wise is intriguing. > It's not speculation: there's hard evidence that not using TRIM results > in SSD performance, bluntly put, sucking badly on some SSDs. >=20 > There's this mentality that wear levelling completely solves all of the > **performance** concerns -- that isn't the case at all. In fact, I'm > under the impression it probably hurts performance, but it depends on > how it's implemented within the drive firmware. >=20 > bit-tech did an experiment using Windows 7 -- which supports and uses > TRIM assuming the device advertises the capability -- with different > models of SSDs. The testing procedure is documented here, but I'll > document it as well: >=20 > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-perform= ance-and-trim/4 >=20 > Again, remember, this is done on a Windows 7 system which does support > TRIM if the device supports it. The testing steps, in this order: >=20 > 1) SSD without TRIM support -- all LBAs are zeroed. > 2) Took read/write benchmark readings. > 3) SSD without TRIM support -- partitioned and formatted as NTFS > (cluster size unknown), copied 100GB of data to the drive, deleted all > the data, and repeated this method 10 times. > 4) Step #2 repeated. > 5) Upgraded SSD firmware to a version that supports TRIM. > 6) SSD with TRIM support -- step #1 repeated. > 7) Step #2 repeated. > 8) SSD with TRIM support -- step #3 repeated. > 9) Step #2 repeated. >=20 > Without TRIM, some drives drop their read performance by more than 50%, > and write performance by almost 70%. I'm focusing on Intel SSDs here, > by the way. I do not care for OCZ or Corsair products. >=20 > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > on FreeBSD will mimic (to some degree). >=20 > Therefore, simply put, users should be concerned when using ZFS on > FreeBSD with SSDs. It doesn't matter to me if you're only using > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > means degraded performance over time. >=20 > Can you refute any of this evidence? >=20 At least now at the moment NO. But I can say depending on how large of a=20 use of SSDs with OpenSolaris users from before the Oracle reaping that I=20 didnt recall seeing any relative bug reports on degradation. But like I=20 said... I havent seen them but thats not to say there wasnt a lack of use= =20 either. Definately more to look into, test, benchmark & test again. > > Given most SSD's come at a size greater than 32GB I hope this comes as = a=20 > > early reminder that the ZIL you are buying that disk for is only going = to=20 > > be using a small percent of that disk and I hope you justify cost over = its=20 > > actual use. If you do happen to justify creating a ZIL for your pool th= en=20 > > I hope that you partition it wisely to make use of the rest of the spac= e=20 > > that is untouched. > >=20 > > For all other cases I would reccomend if you still want to have a ZIL t= hat=20 > > you take some sort of PCI->SD CARD or USB stick into account with=20 > > mirroring. >=20 > Others have pointed out this isn't effective (re: USB sticks). The read > and write speeds are too slow, and limit the overall performance of ZFS > in a very bad way. I can absolutely confirm this claim (I've tested it > myself, using a high-end USB flash drive as a cache device (L2ARC)). >=20 > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > *does* improve performance on older systems which have slower disk I/O > (e.g. ICH5-based systems). >=20 Agreed. Soon as the bus speed, write speeds are greater than the speeds=20 that USB 2.0 can handle, then any USB based solution is useless. ICH5 and= =20 up would be right about that time you would see this starting to happen. sdcards/cfcards mileage may vary depending on the transfer rates. But=20 still the same situation applies like you said once your main pool=20 throughput outweighs the throughput on your ZIL then its probably not=20 worth even having a ZIL or a Cache device. Emphasis on Cache moreso than ZIL. Anyway all good information for those to make the judgement whether they=20 need a cache or a zil. Thanks again Jeremy. Always appreciated. --=20 Regards, (jhell) Jason Hellenthal --17pEHd4RhPHOinZp Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJNyzyAAAoJEJBXh4mJ2FR+2qAH/3A09ZwqGiIjuz25r5FVqwk6 iJuTHR1rlOTV0IqaUh6a2FaFGnWKDu/KpQLOj+ZGDPB6DH70fOon90QvU3/hTjoN RhguCVxHfbQLJbqaXKHZkj+JC6RhMV1H899/VAx29XlVMfvarUXw47vF7Pjcq3G1 tK5pZyK66yldkUzPwQHufIHtcebWu7EVzGWF4Hl25apkRDTyDRHL45rIM/vdDE94 SB81i9bFD+BuMV2KKUwwG/JfKborFxtYID4Vy8nIVDGjq9fE9zh4FTClnyj3wmNE Y5UYsjB1JkZX2q195FkMk3YxLIyS3xSTahHUqwsVZA+bm+Rc2G9DTU1jhlHSudU= =QtJ+ -----END PGP SIGNATURE----- --17pEHd4RhPHOinZp--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110512014848.GA35736>