Date: Tue, 16 Oct 2018 23:31:44 -0400 From: Allan Jude <allanjude@freebsd.org> To: freebsd-hackers@freebsd.org Subject: Re: High load and MySQL slow without apparent reason Message-ID: <248cd85b-f36e-58ea-873d-8d89846f1c93@freebsd.org> In-Reply-To: <CAG0rGZecYsycwuBzhRBngnBc7TG5Y5913VmdLPPhCbodZPKu8Q@mail.gmail.com> References: <CAG0rGZecYsycwuBzhRBngnBc7TG5Y5913VmdLPPhCbodZPKu8Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --NztwEwbgfJvkJ04HVBJP9GJl1lMRV0Htb Content-Type: multipart/mixed; boundary="iVJ1ytEsUb9aACkVJa83rt1B8JQKRSmWE"; protected-headers="v1" From: Allan Jude <allanjude@freebsd.org> To: freebsd-hackers@freebsd.org Message-ID: <248cd85b-f36e-58ea-873d-8d89846f1c93@freebsd.org> Subject: Re: High load and MySQL slow without apparent reason References: <CAG0rGZecYsycwuBzhRBngnBc7TG5Y5913VmdLPPhCbodZPKu8Q@mail.gmail.com> In-Reply-To: <CAG0rGZecYsycwuBzhRBngnBc7TG5Y5913VmdLPPhCbodZPKu8Q@mail.gmail.com> --iVJ1ytEsUb9aACkVJa83rt1B8JQKRSmWE Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 2018-10-16 22:25, Darek Margas wrote: > Hi Everyone, >=20 > I'm trying to refresh my old FreeBSD experience by moving MySQL platfor= m > from Linux onto FreebSD+ZFS. >=20 > Before I ask for your help I would like to give you some context. >=20 > The machine is Dell server 2x20 cores, Intel IXL NIC, 1TB of RAM and lo= ts > of SAS SSD drives. > The kernel is slightly modified by removing some unused stuff, replacin= g > ixl driver with latest from Intel website and enabling NUMA. > The whole thing runs number of MySQL daemons packed in jails (bridged > network) with settings optimized for ZFS ARC caching (O_DIRECT, small > buffers, etc). >=20 > This is 11.2-RELEASE. >=20 > When I tested it first time I found troubles with back pressure on ARC > whilst short in memory leading machine do death. I also found that > disabling ARC compression solved silent death but decided to make some > tunes to keep more memory free for sudden need. >=20 > Ran some tests, used it for replication salves, etc. >=20 > Here is the thing - how I crashed this machine without understanding wh= at > has happened. >=20 > First my tunes. I adjusted v_free_target and v_free_min aiming to 128G = and > 64G respectively. However, I overlooked fact that this is in pages not = in > 1k blocks. As result I set: >=20 > - 700G max ARC size > - 512G v_free_target > - 256G v_free_min You likely want to tune 'vfs.zfs.arc_free_target' to a value very close to v_free_target or atleast v_free_min to cause ZFS to give back memory at that level of memory shortage as well. >=20 > Obviously this is a nonsense, however, the machine worked calm until AR= C > got half of memory. Then shit happened. As I made machine with no swap = at > all I have got number of zombies and problems with reclaiming console (= say, > open VI which works, then exit and VI stays on console while became zom= bie). > That was "fixed" by disabling swapping via sysctl. I also noticed 25% o= f > CPU taken by "system" with nothing popping in top except pagedaemon and= zfs > (on arc_reclaim). >=20 > I have added 40G of swap, rebooted machine but kept wrong settings. >=20 > It was again calm until ARC got half of memory. This is when I found wh= at I > did and fixed v_free stuff to be >=20 > - 128G v_free_target > - 64G v_free_min >=20 > The machine started managing memory the right way, wiping inactive to > laundry and laundering only when needed. I still observed 25% of > unexplained load from "system" (floating 5-60%) but all seemed OK. >=20 > At this point I switched one replica to be master and put production > queries on it. >=20 > Summarizing the above - the machine had issues and has not been reboote= d > but seemed OK with memory management while having unexplained system lo= ad. >=20 > Once I switched my SQLs from Linux master to FreeBSD I noticed slow > performance. There is stored proc called every 15 minutes. On old machi= ne > and all others it takes around 30-40s to complete and previous master h= ad > spike in ROW executions to 650kps (one minute sample) while new one got= it > up to 350kps and run for nearly 3 minutes. >=20 > I started looking deeper and found: > - Made all MySQL settings the same (when possible as some follow platfo= rm) > with no improvement > - MySQL reload did not help > - Stopping all replicas running around on the same machine (5 of them) = to > release resources made it worse (over 5 minutes to complete call). Star= ting > replicas made it better again by one minute. >=20 > BTW - jail was limited to one NUMA zone and half cores. Not all replica= s > had the same NUMA and CPU group. >=20 > I copied ZFS content to test machine which is exactly the same and kick= ed > the same MySQL in same jail and with same settings. > - Test instance ran correctly within similar completion time to old Lin= ux > master > - ARC on test machine was loaded up to 700G so I thought it would be go= od > enough to compare but machine still had lots of memory >=20 > To make it closer I compiled "memory allocator" which simply allocates = and > fills memory until killed or system dies. >=20 > Run it on test machine first: > - No effect until v_mem_target passed > - Once passed pagedaemon kicked in, memory got wiped and shifted, swap = got > full (paging only anyway) > - Load around 20% appeared from system, similar to broken production ma= chine > - Got down to 50G passing v_free_min > - KIlled allocator > - After 1-2s freezing all got back to normal, load from system was gone= =2E > - Swap was in use for some time after but finally got clean (that was o= nly > 4G swap on test machine) > - After some time machine is still calm and MySQL fast >=20 > Repeated the same on production machine: > - All as above, except: > - after killing allocator machine got frozen for, say, 10-15s > - memory was released but load did not change - neither got much higher= > while allocating memory nor lower after. > - Machine remained slow >=20 > Finally I rebooted whole machine and now it is fast while building ARC.= I > believe it won't have the same issue soon as v_free stuff is set correc= tly, > however, I need to understand why this MySQL process suffered and wheth= er > it was possible to recover it without reboot. I can imagine it was > something running in a loop or contention on something otherwise unused= or > simply another clash in settings triggering something in unusual way bu= t > have no idea where to look to investigate it. Well, it's possible that > there is a bug too. >=20 > Before reboot I collected various vmstats, tops, ran ktrace on MySQL an= d > sysctl to dump settings. Not posting as don't know what would be useful= =2E >=20 > Could you please point me in right direction? >=20 > Cheers, > Darek > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" >=20 --=20 Allan Jude --iVJ1ytEsUb9aACkVJa83rt1B8JQKRSmWE-- --NztwEwbgfJvkJ04HVBJP9GJl1lMRV0Htb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) iQIcBAEBAgAGBQJbxq0kAAoJEBmVNT4SmAt+PHUQALxpVZdNoGRBNv0nDMw86Fzh yG/he3JC8eEqlIi+t34sPtTkwINc6F9QgRCSkWAe1DyCLDkVgZ8AHZgSeuiNFQVW tPP4UL5h33fjYO+BMEZy6hdVkHQZivZ9YjyhDuo/s9NjKTekpjk2V8ngOe2W6KD5 vt7GgN04jp43lCtQ4RR3toCjZzkMOZHgaMJZ34n9AOlb1YflJrAYbJpGt4eVnTPN 0DD+hq9RXkAqzPxBfQsCZLB2vFezAgFrv2GZ0AP0otKkZgpe9ahHPzk5899AvRm9 lBMzlW5Qh0I1cs+yfb3Uhb1VefQIuIuAPjSJQjengOdSdcEZWZQCU37IMGSrurm7 22HS3f65OGrId/dE9si4+nX6Vg/ZcSxNnsxt8bYS52Yq6q01HKWZFXp1728vCNvc hJ+7QN5AnCBPjFpUMHTRmzXLXulRdM3tIsRkFNn3n1FvCnk+SqnoQ8rOs3lpb2yp Xs/4z5cMahEggqIu6eukJMqo/cxxOHtIQ/0FL6EXndu6OrWJllDOZtnXtYRIEH5x 0M21Mi44h7WvLnyl/SDEYhzTxvPe+/DwrKTKfF4kWlf8wPJcgGP5uUCD+lcQI0ZM +NsZIDlsqqHuukHQ4Kho+kjZzo8neMZiCEBZRqWX3R0iM/00M2/uB8qNZtEFRMB6 25VJqU04/4DRiPgwPuCg =PYHj -----END PGP SIGNATURE----- --NztwEwbgfJvkJ04HVBJP9GJl1lMRV0Htb--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?248cd85b-f36e-58ea-873d-8d89846f1c93>