Date: Fri, 05 Jul 2013 10:43:28 +0300 From: Daniel Kalchev <daniel@digsys.bg> To: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <51D67920.5030800@digsys.bg> In-Reply-To: <20130704191203.GA95642@icarus.home.lan> References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com> <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk> <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se> <20130704000405.GA75529@icarus.home.lan> <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 04.07.13 22:12, Jeremy Chadwick wrote: > I believe -- but I need someone else to chime in here with confirmation, > particularly someone who is familiar with ZFS's internals -- once your > pool is ashift 12, you can do a disk replacement ***without*** having to > do the gnop procedure (because the pool itself is already using ashift > 12). But again, I need someone to confirm that. I do not in any way claim to know well the ZFS internals, but can confirm this: Once you have an ZFS vdev 4k aligned (ashift=12), you can replace drives in there and the vdev will stay 4k aligned. In ZFS, the alignment is per-vdev, not per device, not per zpool. When creating a new vdev, ZFS looks for the largest sector size as the underlying storage supports it and uses it. This is why you only need to apply the gnop trick to just one of the drives. Once the vdev is created, it pretty much does not care what the underlying storage reports. > On these drives there are ways to work around this issue -- it > specifically involves disabling drive-level APM. To do so, you have to > initiate a specific ATA CDB to the drive using "camcontrol cmd", and > this has to be done every time the system reboots. There is one > drawback to disabling APM as well: the drives run hotter. There is a way to do this with smartmontools as well, either with smartctl or smartd (which is wise thing to run anyway). Look for the -g option and the apm sub-option. Sometimes, for example when you have ATA devices connected trough SAS backplanes and HBAs you can't send them these commands via camcontrol. > These SSDs need a full Secure Erase done to them. In stable/9 you can > do this through camcontrol, otherwise you need to use Linux (there are > live CD/DVD distros that can do this for you) or the vendor's native > utilities (in Windows usually). ZFS in stable/9 actually does full TRIM when you attach a new device, which can be observed/confirmed via the TRIM statistics counters. You don't need to use any external utilities. > UNDERSTAND: THIS IS NOT THE SAME AS A "DISK FORMAT" OR "ZEROING THE > DISK". In fact, dd if=/dev/zero to zero an SSD would be the worst > possible thing you could do to it. Secure Erase clears the entire FTL > and resets the wear levelling matrix (that's just what I call it) back > to factory defaults, so you end up with out-of-the-box performance: > there's no more LBA-to-NAND-cell map entries in the FTL (which are > usually what are responsible for slowdown). I do not believe Secure Erase does what you propose. It more or less just does full device TRIM. Resetting things to factory defaults won't make any vendor happy, because they base their SSD warranties on the wear level. Anyway, if you know of a way to trick this, I am all ears :) > Your Intel drive is very very small, and in fact I wouldn't even bother > to use this drive -- it means you'd only be able to use roughly 14GB of > it (at most) for data, and leave the remaining 6GB unallocated/unused > solely for wear levelling. An small SLC FLASH based drive might be worth more than a large MLC based drive... Just saying. The SLOG rarely fills the drive and if you use TRIM, you should be safe. > What you're not taking into consideration is how log and cache devices > bottleneck ZFS, in addition to the fact that SATA is not like SAS when > it comes to simultaneous R/W. That poor OCZ drive... With proper setup, there is really no bottleneck. For the cache device, it is advisable to set vfs.zfs.l2arc_norw=0 As otherwise data will not be read from the L2ARC while something is written there. This is problematic with metadata, for other data, you just don't get the performance you could form having a SSD. For mixing SLOG and L2ARC.. I always think it is a bad idea to do so. There are two reasons to have SLOG: 1. To reduce latency. By combinning SLOG and L2ARC on the same device you might not have enough IOPS in order to have low latency and consumer grade SSDs tend to not have consistent latency anyway. Some newer drives are promising, for example the OCZ Vector, or better yet the Intel S3500/S3700. 2. To reduce ZFS pool fragmentation. This is very important and often very much overlooked by everyone. If you want ZFS to perform well, you are better to have separate LOG device even if it is on an rotating disk (you only lose the low latency!). ZFS pool fragmentation might be a problem for long lived pools. Mirroring the SLOG is just a safeguard, for not losing the last few seconds of writing really important data. But, if you could afford it, just do it. Considering the small size of this pool however, I do not believe using one SSD for both SLOG and L2ARC might be serious bottleneck, unless real-life observation says otherwise. Daniel
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51D67920.5030800>