Date: Thu, 24 Sep 2015 10:20:04 -0400 From: Paul Kraus <paul@kraus-haus.org> To: Raimund Sacherer <rs@logitravel.com>, FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: Restructure a ZFS Pool Message-ID: <0A3AC2BF-0BF6-4CFB-8947-EFA01B58CF93@kraus-haus.org> In-Reply-To: <168652008.9504625.1443102487228.JavaMail.zimbra@logitravel.com> References: <480627999.9462316.1443098561442.JavaMail.zimbra@logitravel.com> <9EE24D9C-260A-408A-A7B5-14BACB12DDA9@kraus-haus.org> <168652008.9504625.1443102487228.JavaMail.zimbra@logitravel.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 24, 2015, at 9:48, Raimund Sacherer <rs@logitravel.com> wrote: > Yes, I understood that it will only help preventing fragmentation in = the future. I also read that performance is great when using async ZFS, That is an overly general statement. I have seen zpools perform badly = with async as well as sync writes when not configured to match the = workload. > would it be safe to use async ZFS if I have Battery Backed Hardware = Raid Controller (1024G ram cache)? > The server is a HP G8 and I have configured all discs as single disk = mirrors (the only way to get a JBOD on this raid controller). In the = event of a power outage, everything should be held in the raid = controller by the battery and it should write on disk as soon as power = is restored, Turning off sync behavior violates Posix compliance and is not a very = good idea. Also remember that async writes are cached in the ARC=85 so = you need power for the entire server, not just the disk caches, until = all activity has ceased _and_ all pending Transaction Groups (TXG) have = been committed to non-volatile storage. TXGs are generally committed = every 5 seconds, but if you are under heavy write load it may take more = time than that. > ... would that be safe environment to switch ZFS to async?=20 No one can make that call but you. You know your environment, you know = your workload, you know the fallout from lost writes _if_ something goes = wrong. > If I use async, is there still the *need* for a SLOG device, I read = that running ZFS async and using the SLOG is comparable, because both = let the writes be ordered and those prevent fragmentation? It is not a = critical system (e.g. downtime during the day is possible), but if = restores need to be done I'd rather have it run as fast as possible.=20 If you disable sync writes (please do NOT say =93use async=94 as that is = determined by the application code), then you are disabling the ZIL (ZFS = Intent Log) and the SLOG is a device to hold _just_ the ZIL separate = from the data vdevs in the zpool. So, yes, disabling sync writes means = that even if there is a SLOG it will never be used. >> Yes, but unless you can stand loosing data in flight (writes that the = system >> says have been committed but have only made it to the SLOG), you = really want >> your SLOG vdev to be a mirror (at least 2 drives). > Shouldn't this scenario be handled by ZFS (writes to SLOG, power out, = power on, SLOG is transferred to data disks?) Not if the single SLOG device _fails_ =85 In the case of a power = failure, once the system comes back up ZFS will replay the TXGs on the = SLOG and you will not have lost any writes, > I thought the only dataloss would be writes which are currently in = transit TO the SLOG in time of the power outage? Once again, if the application requests sync writes, the application is = not told that the write is complete _until_ it is committed to = non-volatile backing storage, in this case the ZIL/SLOG device(s). So = from the application=92s perspective, no writes are lost because they = were not committed when power failed. This is one of the use cases where = claiming that disabling sync behavior and assuming UPS / battery backed = up cache is just as good as a SLOG device is misleading. The application = is asking for a sync write and it is being lied to. > And I read somewhere that with ZFS since V28 (IIRC) if the SLOG dies = it turns off the log and you loose the (performance) benefit of the = SLOG, but the pools should still be operational? There are separate versions for zpool and zfs, you are referring to = zpool version 28. Log device removal was added in zpool version 19. = `zpool upgrade -v` will tell you which versions / features your system = supports. `zfs upgrade -v` will tell you the same thing for zfs = versions. FreeBSD 10.1 has zfs version 5 and zpool version 28 plus lots = of added features. Feature flags were a way to add features to zpools = while not completely breaking compatibility. So you can remove a failed SLOG device, and if they are mirrored you = still don=92t lose any data. I=92m not sure what happens to a running = zpool if a single (non-mirrored) SLOG device fails. >> In a zpool of this size, especially a RAIDz<N> zpool, you really want = a hot >> spare and a notification mechanism so you can replace a failed drive = ASAP. >> The resilver time (to replace afield drive) will be limited by the >> performance of a _single_ drive for _random_ I/O. See this post >> http://pk1048.com/zfs-resilver-observations/ for one of my resilver >> operations and the performance of such. > Thank you for this info, I'l keep it in mind and bookmark your link. Benchmark your own zpool, if you can. Do a zpool replace on a device and = see how long it takes. That is a reasonable first approximation of how = long it will take to replace a really failed device. I tend to stick = with no bigger than 1 TB drives to keep resilver times reasonable (for = me). I add more vdevs of mirrors as I need capacity. -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0A3AC2BF-0BF6-4CFB-8947-EFA01B58CF93>