From owner-freebsd-hackers@freebsd.org Wed Oct 17 02:25:49 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7813C10E7FAF for ; Wed, 17 Oct 2018 02:25:49 +0000 (UTC) (envelope-from darek@tramada.com) Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 123318BB01 for ; Wed, 17 Oct 2018 02:25:48 +0000 (UTC) (envelope-from darek@tramada.com) Received: by mail-io1-xd32.google.com with SMTP id n18-v6so17877348ioa.9 for ; Tue, 16 Oct 2018 19:25:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tramada-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=9RtI8k4tper0rnmn+kDDESQPQKaCg9Cxzvwnt+x4qzo=; b=ylPR4WcBWmexa+9UnsBqXv4yPFX0RxN+iOpHFUC30FnJ+Qyn726M/vdUSKX6tEE//X HrgtVf/aA2EZuBlkBmjT9QwefHsB/XIqLVXs7Te6NfpKPcg7dsF0vIOIkee+oDzuRrdz pwBbt2H5+Oas4+9RZCL5f7Qh9Sufgl6pSkAE7UC+ugLqaUDSL/xx8O7Pdahf96M/gyLW BtW19i9jAhaNfr9RHci2GwctpXhqH30G+HbjX64TRGXUCaXsw+Ctsuy0vJvNb86dO4aZ PReVBnhPxgw+b1D/Ub72n7k+C4Xs8XUGnDsjUVU3ATSBcWfD0RA5WUwBYxBBK79Z3+Bi IvVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=9RtI8k4tper0rnmn+kDDESQPQKaCg9Cxzvwnt+x4qzo=; b=eieOd3iM/2LUCBsRSjMrZhMnObrQqkvTV84GZaQBRFYc7LylOH2PDao33uX98JT7bW /i4zFCc20qkOYsLpxYP1PjFTeuIGE5sJl4EBz3VN5yD27fzBNq3pUXH7eDu2eNhGNFBL aM9kh+feoquUFm57hbaf/1l6Es76PWM3q0I31dQdkOaXmkJ51+/R31F2btrJv9w0AROD 0lxIMlW7OlwjD8sfNn1wbOpPMVwbYQgHThQJ11V8sgIrjENTFaTAtj8deIInmwvcpFJX z5xLRzIQppehCQLgQzv5YAf+7Cm+Ut/VpMpuMf+vkIDy0ZNEf283YxzzjtHjCvWuFUJ9 dTjQ== X-Gm-Message-State: ABuFfojHAu11vu8ElmfMckBgVweaVizPxoYQzioAVMbWgPxwTwljljKC IQbE6VL+yu+vDq4jhKWavEOvLH1G8kwVURSk4R3MQRuloS0= X-Google-Smtp-Source: ACcGV62EXanDX7ql3IAr9ScFOkeWz5FxcM1OVRmxQhoqeSTl0JgmioyyCPTgTskYPedaduLJ0bDGNQ7Titmm9SR3v2A= X-Received: by 2002:a6b:c8d2:: with SMTP id y201-v6mr16029280iof.286.1539743147464; Tue, 16 Oct 2018 19:25:47 -0700 (PDT) MIME-Version: 1.0 From: Darek Margas Date: Wed, 17 Oct 2018 13:25:31 +1100 Message-ID: Subject: High load and MySQL slow without apparent reason To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2018 02:25:49 -0000 Hi Everyone, I'm trying to refresh my old FreeBSD experience by moving MySQL platform from Linux onto FreebSD+ZFS. Before I ask for your help I would like to give you some context. The machine is Dell server 2x20 cores, Intel IXL NIC, 1TB of RAM and lots of SAS SSD drives. The kernel is slightly modified by removing some unused stuff, replacing ixl driver with latest from Intel website and enabling NUMA. The whole thing runs number of MySQL daemons packed in jails (bridged network) with settings optimized for ZFS ARC caching (O_DIRECT, small buffers, etc). This is 11.2-RELEASE. When I tested it first time I found troubles with back pressure on ARC whilst short in memory leading machine do death. I also found that disabling ARC compression solved silent death but decided to make some tunes to keep more memory free for sudden need. Ran some tests, used it for replication salves, etc. Here is the thing - how I crashed this machine without understanding what has happened. First my tunes. I adjusted v_free_target and v_free_min aiming to 128G and 64G respectively. However, I overlooked fact that this is in pages not in 1k blocks. As result I set: - 700G max ARC size - 512G v_free_target - 256G v_free_min Obviously this is a nonsense, however, the machine worked calm until ARC got half of memory. Then shit happened. As I made machine with no swap at all I have got number of zombies and problems with reclaiming console (say, open VI which works, then exit and VI stays on console while became zombie). That was "fixed" by disabling swapping via sysctl. I also noticed 25% of CPU taken by "system" with nothing popping in top except pagedaemon and zfs (on arc_reclaim). I have added 40G of swap, rebooted machine but kept wrong settings. It was again calm until ARC got half of memory. This is when I found what I did and fixed v_free stuff to be - 128G v_free_target - 64G v_free_min The machine started managing memory the right way, wiping inactive to laundry and laundering only when needed. I still observed 25% of unexplained load from "system" (floating 5-60%) but all seemed OK. At this point I switched one replica to be master and put production queries on it. Summarizing the above - the machine had issues and has not been rebooted but seemed OK with memory management while having unexplained system load. Once I switched my SQLs from Linux master to FreeBSD I noticed slow performance. There is stored proc called every 15 minutes. On old machine and all others it takes around 30-40s to complete and previous master had spike in ROW executions to 650kps (one minute sample) while new one got it up to 350kps and run for nearly 3 minutes. I started looking deeper and found: - Made all MySQL settings the same (when possible as some follow platform) with no improvement - MySQL reload did not help - Stopping all replicas running around on the same machine (5 of them) to release resources made it worse (over 5 minutes to complete call). Starting replicas made it better again by one minute. BTW - jail was limited to one NUMA zone and half cores. Not all replicas had the same NUMA and CPU group. I copied ZFS content to test machine which is exactly the same and kicked the same MySQL in same jail and with same settings. - Test instance ran correctly within similar completion time to old Linux master - ARC on test machine was loaded up to 700G so I thought it would be good enough to compare but machine still had lots of memory To make it closer I compiled "memory allocator" which simply allocates and fills memory until killed or system dies. Run it on test machine first: - No effect until v_mem_target passed - Once passed pagedaemon kicked in, memory got wiped and shifted, swap got full (paging only anyway) - Load around 20% appeared from system, similar to broken production machine - Got down to 50G passing v_free_min - KIlled allocator - After 1-2s freezing all got back to normal, load from system was gone. - Swap was in use for some time after but finally got clean (that was only 4G swap on test machine) - After some time machine is still calm and MySQL fast Repeated the same on production machine: - All as above, except: - after killing allocator machine got frozen for, say, 10-15s - memory was released but load did not change - neither got much higher while allocating memory nor lower after. - Machine remained slow Finally I rebooted whole machine and now it is fast while building ARC. I believe it won't have the same issue soon as v_free stuff is set correctly, however, I need to understand why this MySQL process suffered and whether it was possible to recover it without reboot. I can imagine it was something running in a loop or contention on something otherwise unused or simply another clash in settings triggering something in unusual way but have no idea where to look to investigate it. Well, it's possible that there is a bug too. Before reboot I collected various vmstats, tops, ran ktrace on MySQL and sysctl to dump settings. Not posting as don't know what would be useful. Could you please point me in right direction? Cheers, Darek