From owner-freebsd-fs@FreeBSD.ORG Mon Mar 10 17:12:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6A52B320 for ; Mon, 10 Mar 2014 17:12:18 +0000 (UTC) Received: from r2-d2.netlabs.org (r2-d2.netlabs.org [213.238.45.90]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A673B300 for ; Mon, 10 Mar 2014 17:12:17 +0000 (UTC) Received: (qmail 4388 invoked by uid 89); 10 Mar 2014 17:05:34 -0000 Received: from unknown (HELO eternal.bfh.ch) (ml-ktk@netlabs.org@147.87.42.175) by 0 with ESMTPA; 10 Mar 2014 17:05:34 -0000 Message-ID: <531DF0DD.8070809@netlabs.org> Date: Mon, 10 Mar 2014 18:05:33 +0100 From: Adrian Gschwend User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Reoccuring ZFS performance problems Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Mar 2014 17:12:18 -0000 Hi group, (I have a lot of pastes in here, see http://pastebin.com/yjQnLryP for this email in case the mail kills too long lines) On a regular base I run into some very weird ZFS performance issues on my server. When it happens file IO is terribly slow and even a simple ls can take a long time (worst case up to minutes). Everything which relies on file IO is basically dead in this mode so even starting top or other tools is a PITA. This stage can stay from minutes to hours and goes back to normal after some random time. A reboot does not necessarily fix it, often I'm back in exactly this stage after reboot. I do not see any patterns when it happens in my monitoring (munin), in terms of when munin starts to time out because of it I do not see peaks upfront in any of the system graphs I do. When I run 'top' in this mode I see many processes in one of these states: tx->tx zfs umtxn (mainly on mysql, which is unkillable in this mode) uwait Setup: * FreeBSD 9.2-RELEASE-p3 in a KVM (SmartOS Solaris host, running ZFS itself) * I'm using mfsbsd to do a ZFS only system * No specific ZFS changes (I did play with some, see last part of email) * There are 5 jails running * I run various Apaches (PHP/SVN/TRAC etc), MySQL, LDAP daemon, a JVM and some SIP servers (Freeswitch) * Normal load is around 20-30% (2 cores) * Swap is currently on 1% usage (4G available) * I have 16GB of memory available, munin still shows around 1-2 GB as free. It can be that the issue happens faster with less memory but I cannot prove it either. * Currently no dtrace enabled so can't get much further than standard tools shipped with BSD * zpool status does not report any failures The issues are not new, they first appeared while the system was still running on real hardware (FBSD 8.x) and not within a KVM. Back then I assumed I have a hardware problem but the problem reappeared on the virtualized install. This install was basically a 1:1 zfs send copy plus some bootloader hacking of the old system so exact same software levels. I switched to a new install on 9.x and had the issue on every single release there as well. I did not try 10.x yet. fstat | wc -l: 7068 (took forever to let it finish) gstat gives me: dT: 1.010s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 10 575 1 0 0.2 574 653 17.8 100.4| vtbd0 0 0 0 0 0.0 0 0 0.0 0.0| PART/vtbd0/vtbd0 0 0 0 0 0.0 0 0 0.0 0.0| vtbd0p1 0 0 0 0 0.0 0 0 0.0 0.0| vtbd0p2 10 575 1 0 0.3 574 653 18.0 100.4| vtbd0p3 0 0 0 0 0.0 0 0 0.0 0.0| DEV/vtbd0/vtbd0 0 0 0 0 0.0 0 0 0.0 0.0| cd0 0 0 0 0 0.0 0 0 0.0 0.0| DEV/vtbd0p1/vtbd0p1 0 0 0 0 0.0 0 0 0.0 0.0| LABEL/vtbd0p1/vtbd0p1 0 0 0 0 0.0 0 0 0.0 0.0| gptid/e402ecce-89ca-11e2-a867-3264262b9894 0 0 0 0 0.0 0 0 0.0 0.0| DEV/vtbd0p2/vtbd0p2 0 0 0 0 0.0 0 0 0.0 0.0| LABEL/vtbd0p2/vtbd0p2 0 0 0 0 0.0 0 0 0.0 0.0| gptid/e4112d88-89ca-11e2-a867-3264262b9894 0 0 0 0 0.0 0 0 0.0 0.0| DEV/vtbd0p3/vtbd0p3 0 0 0 0 0.0 0 0 0.0 0.0| SWAP/swap/gptid/e4112d88-89ca-11e2-a867-3264262b9894 0 0 0 0 0.0 0 0 0.0 0.0| DEV/cd0/cd0 0 0 0 0 0.0 0 0 0.0 0.0| DEV/gptid/e402ecce-89ca-11e2-a867-3264262b9894/gptid/e402ecce-89ca -11e2-a867-3264262b9894 0 0 0 0 0.0 0 0 0.0 0.0| DEV/gptid/e4112d88-89ca-11e2-a867-3264262b9894/gptid/e4112d88-89ca -11e2-a867-3264262b9894 0 0 0 0 0.0 0 0 0.0 0.0| ZFS::VDEV/zfs::vdev/vtbd0p3 ms/w changes a lot, highest I've seen right now was around 70 zfs iostat 2: capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- tank 173G 21.5G 3 56 30.9K 535K tank 173G 21.5G 62 376 49.0K 1.22M tank 173G 21.5G 46 340 84.3K 565K tank 173G 21.5G 45 566 74.6K 800K tank 173G 21.5G 32 222 92.0K 958K tank 173G 21.5G 63 392 120K 1.10M tank 173G 21.5G 16 286 14.2K 338K tank 173G 21.5G 29 313 24.6K 831K tank 173G 21.5G 0 289 0 445K tank 173G 21.5G 27 244 32.6K 293K tank 173G 21.5G 43 385 42.8K 477K tank 173G 21.5G 31 329 15.7K 710K tank 173G 21.5G 65 394 46.8K 1.50M tank 173G 21.5G 80 320 127K 754K tank 173G 21.5G 30 425 144K 1.09M tank 173G 21.5G 13 399 25.9K 379K tank 173G 21.5G 10 194 5.22K 162K tank 173G 21.5G 18 311 45.5K 1.02M tank 173G 21.5G 29 202 58.5K 344K tank 173G 21.5G 32 375 108K 926K on the host OS (SmartOS) zpool iostat 2 shows me (there is one other FBSD box but there is almost no IO on this one): capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- zones 602G 230G 5 278 44.0K 2.42M zones 602G 230G 2 154 19.9K 2.18M zones 602G 230G 0 1.47K 0 10.3M zones 602G 230G 0 128 0 1.44M zones 602G 230G 0 270 0 2.61M zones 602G 230G 0 1.39K 0 10.2M zones 602G 230G 0 114 7.96K 2.10M zones 602G 230G 0 979 7.96K 7.84M When the guest system is not in this state the writes are lower and I don't see modes where I have 0 on reads. I was googling around about this topic forever and I do find people who report similar issues. No one I've contacted found a real explanation for it. Based on various guides I started adapting the basic config: cat /boot/loader.conf: vfs.zfs.zfetch.block_cap=64 # this one was horrible, bootup alone was dogslow #vfs.zfs.write_limit_override=1048576 #vfs.zfs.txg.timeout="5" # so far good results? vfs.zfs.prefetch_disable="1" First I thought disabling prefetch did solve the issue for a while. But it looks like I was too optimistic with that one. However, ls feels *much* faster when the system is happy since I disabled prefetch. I'm really totally lost on this one so I would appreciate hints about how to debug that. I'm willing to test whatever it takes to figure out where this issue is. thanks Adrian