Date: Tue, 15 Dec 2009 13:39:04 +0100 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-fs@freebsd.org Cc: freebsd-hackers@freebsd.org Subject: ZFS, compression, system load, pauses (livelocks?) Message-ID: <hg8015$999$1@ger.gmane.org>
next in thread | raw e-mail | index | archive | help
The context of this post is file servers running FreeBSD 8 and ZFS with compressed file systems on low-end hardware, or actually high-end hardware on VMWare ESX 3.5 and 4, which kind of makes it low-end as far as storage is concerned. The servers are standby backup mirrors of production servers - thus many writes, few reads. Running this setup I notice two things: 1) load averages get very high, though the only usage these systems get is file system usage: last pid: 2270; load averages: 19.02, 14.58, 9.07 up 0+09:47:03 11:29:04 2) long pauses, in what looks like vfs.zfs.txg.timeout second intervals, which seemengly block everything, or at least the entire userland. These pauses are sometimes so long that file transfers fail, which must be avoided. I think these two are connected. Monitoring the system with "top" and "iostat" reveals that the state between the pauses are mostly idle (data is being sent to the server over a gbit network in rates of 15+ MB/s). During the pauses there is heavy IO activity which reflects both in top - kernel threads spa_zio_* (ZFS taskqueues) are hogging the CPU and immediately after the pause iostat reveals several tens of MB written to the drives. Except for the pause, this is expected - ZFS is compressing data before writing it down. The pauses are interesting. Immediately after such pause the system status is similar to this one: 91 processes: 12 running, 63 sleeping, 16 waiting CPU: 1.4% user, 0.0% nice, 96.3% system, 0.3% interrupt, 2.0% idle Mem: 75M Active, 122M Inact, 419M Wired, 85M Buf, 125M Free (this is the first "top" output after a pause). Looking at the list of processes it looks like a large number of kernel and userland processes are woken up at once. From the kernel side there are regularily all g_* threads, but also unrelated threads like bufdaemon, softdepflush, etc. and from the userland - top, syslog, cron, etc. It is like ZFS livelocks everything else. The effects of this can be lessened by reducing vfs.zfs.txg.timeout, vfs.zfs.vdev.max_pending and using the attached patch which creates NCPU ZFS worker threads instead of hardcoding them to "8". The patch will probably also help the high-end hardware end of the spectrum, where 16-core users will finally be able to dedicate them all to ZFS :) With these measures I have reduced pauses to a second or two every 10 seconds instead of up to tens of seconds every 30 seconds, which is good enough so transfers don't timeout, but could probably be better. Any ideas on the "pauses" issue? The taskq-thread patch is below. If nobody objects (pjd? I don't know how harder will it make it for you to import future ZFS versions?) I will commit it soon. --- /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c 2009-03-29 01:31:42.000000000 +0100 +++ spa.c 2009-12-15 13:36:05.000000000 +0100 @@ -58,15 +58,16 @@ #include <sys/callb.h> #include <sys/sunddi.h> #include <sys/spa_boot.h> +#include <sys/smp.h> #include "zfs_prop.h" #include "zfs_comutil.h" -int zio_taskq_threads[ZIO_TYPES][ZIO_TASKQ_TYPES] = { +static int zio_taskq_threads[ZIO_TYPES][ZIO_TASKQ_TYPES] = { /* ISSUE INTR */ { 1, 1 }, /* ZIO_TYPE_NULL */ - { 1, 8 }, /* ZIO_TYPE_READ */ - { 8, 1 }, /* ZIO_TYPE_WRITE */ + { 1, -1 }, /* ZIO_TYPE_READ */ + { -1, 1 }, /* ZIO_TYPE_WRITE */ { 1, 1 }, /* ZIO_TYPE_FREE */ { 1, 1 }, /* ZIO_TYPE_CLAIM */ { 1, 1 }, /* ZIO_TYPE_IOCTL */ @@ -498,7 +499,8 @@ for (int t = 0; t < ZIO_TYPES; t++) { for (int q = 0; q < ZIO_TASKQ_TYPES; q++) { spa->spa_zio_taskq[t][q] = taskq_create("spa_zio", - zio_taskq_threads[t][q], maxclsyspri, 50, + zio_taskq_threads[t][q] == -1 ? mp_ncpus : zio_taskq_threads[t][q], + maxclsyspri, 50, INT_MAX, TASKQ_PREPOPULATE); } }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hg8015$999$1>