From owner-freebsd-stable@FreeBSD.ORG Tue Jul 6 23:56:03 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EC0B116A4CE for ; Tue, 6 Jul 2004 23:56:02 +0000 (GMT) Received: from util.inch.com (shellutil.inch.com [216.223.208.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6522E43D1F for ; Tue, 6 Jul 2004 23:56:02 +0000 (GMT) (envelope-from spork@inch.com) Received: from shell.inch.com (www.inch.com [216.223.192.20]) i66NuStG028567 for ; Tue, 6 Jul 2004 19:56:28 -0400 (EDT) (envelope-from spork@inch.com) Received: from shell.inch.com (localhost [127.0.0.1]) by shell.inch.com (8.12.8p2/8.12.8) with ESMTP id i66Nu1JS003229 for ; Tue, 6 Jul 2004 19:56:01 -0400 (EDT) (envelope-from spork@inch.com) Received: from localhost (spork@localhost)i66Nu1ld003226 for ; Tue, 6 Jul 2004 19:56:01 -0400 (EDT) X-Authentication-Warning: shell.inch.com: spork owned process doing -bs Date: Tue, 6 Jul 2004 19:56:01 -0400 (EDT) From: Charles Sprickman To: stable@freebsd.org Message-ID: <20040706193151.K6269@shell.inch.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: mysql slowdown under load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2004 23:56:03 -0000 Hi, I've been trying to track down this problem for some time now. Since I have a temporary fix in place, I can breathe a bit and hope someone here has some answers (questions are good too!). I have a dual Athlon 1600MP box running 4.8-p3. It's a mail server running qmail with vpopmail on top of that, backed by mysql for storing user info (user/pass/homedir, etc.). Webmail is currently served off of the box as well (waiting for a vpopmail upgrade to peel this service off the box). In the past week we've had complaints from users about having to re-enter their POP passwords, some bounces of legit mail (chkuser patch on qmail-smtpd gives a 550 if it can't do a lookup, blech), and password failures on webmail. After some digging, I was able to figure out that this all came down to mysql not returning an answer fast enough. The box generally runs a load of about .70, peaks at 3-4 during heavy spam runs. Each mail delivery can theoretically generate about 6 sql queries (don't ask - I wouldn't likely do vpopmail again if I had a choice). There are about 3,000 active mail users on here. Even at high loads, the box is not swapping at all, and interactive response such as webmail and normal IMAP sessions is still excellent. But the higher the load goes, the more problems mysql seems to have. The slow query log grows, even simple db "pings" can take up to 45 seconds to complete. Mysql is built from ports, so far I've had these versions running: 4.0.18 w/LinuxThreads 4.0.20 w/LinuxThreads 4.0.20 w/native threads There was no difference going from .18 to .20, but going from LinuxThreads to system threads did bring a bit of relief. My vague hypothesis here (based on past work on larger scale qmail boxes) is that at some point the kernel just gets a bit "lost" as to what processes get prioritized; qmail with it's "hand the mail from process to process" architecture causes lots of processes to be spawned then destroyed. It seems to me that the system gets so busy with spawning all of this that mysql languishes. Looking at iostat, vmstat and systat, it seems like the number of syscalls is very high, as is the "context switches" number. However memory is fine (no paging in/out to disk), cpu is fine, disk i/o is fairly low (underlying storage is 4 disks on an adaptec raid controller, raid 1+0), but mysql continues to "stall". Mysqladmin and some mrtg graphing shows that I'm doing about 15 queries/second peak (which seems pretty low compared to past work I've done with mysql). I saw a previous thread about this problem that never really came to any good conclusions; the tester cron'd a script that just connected to mysql, ran a simple query and recorded whether the connection failed or not and how long it took. Even under very low load connects would timeout or take a long time to complete. My band-aid is to replicate the db to another box and use that for vpopmail for all non-update/insert queries. So far, so good, but my curiousity has the better of me on this; it seems like I should be able to run a local mysql server with such a low Q/S rate without issue. This could also be fixed in vpopmail; the design there is not very robust in the face of db failures. Here's a few stats, perhaps something jumps out at someone. And if anyone can explain exactly what the high level of syscalls indicates (and how much is too much) and what "context switches" are and how they figure in, that would be great. iostat: tty da0 md0 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id -0 -86 0.00 0 0.00 0.00 0 0.00 -91 -3-108 -5308 0 30 12.00 50 0.58 0.00 0 0.00 2 0 6 0 92 0 30 10.64 90 0.94 0.00 0 0.00 5 0 6 0 89 0 30 8.44 45 0.37 0.00 0 0.00 3 0 7 0 90 0 30 9.67 60 0.57 0.00 0 0.00 3 0 7 0 90 0 30 11.68 37 0.42 0.00 0 0.00 1 0 3 0 95 0 30 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95 0 30 10.41 37 0.37 0.00 0 0.00 4 0 8 0 87 0 30 11.78 9 0.10 0.00 0 0.00 0 0 3 0 97 0 30 5.00 1 0.00 0.00 0 0.00 1 0 1 0 98 0 30 8.23 461 3.71 0.00 0 0.00 1 0 5 0 94 vmstat (sorry about the wrap - "sy" and "cs" concern me here): procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 md0 in sy cs us sy id 1 12 0 800380 43788 135 0 0 0 206 21 0 0 159 72 187 -94 -114 308 1 12 0 795304 45700 1656 0 0 0 1919 0 75 0 509 8289 533 3 8 89 0 11 0 793812 46644 610 0 0 0 776 0 18 0 411 2718 211 1 3 96 0 11 0 794232 46264 1668 0 0 0 1335 0 174 0 463 5620 647 3 10 87 0 11 0 793772 46548 793 0 0 0 700 0 15 0 397 3265 301 2 4 95 2 12 0 796476 46344 970 0 0 0 869 0 20 0 393 3871 353 1 6 93 0 12 0 796900 45560 1390 0 0 0 935 0 16 0 381 9099 577 22 7 71 2 15 0 799664 43432 2392 0 0 0 1619 0 32 0 469 9379 704 2 10 87 2 14 0 803256 42372 3131 0 0 0 2794 0 73 0 579 13544 1495 4 19 77 1 14 0 803124 42348 2299 0 0 0 2043 0 65 0 541 10809 1173 4 13 83 1 15 0 803348 41624 2614 0 0 0 2385 0 24 0 448 9002 549 4 9 87 0 15 0 801100 42108 1099 0 0 0 973 0 27 0 470 6665 554 5 8 87 0 17 0 801320 42100 1219 0 0 0 983 0 37 0 442 5557 533 1 6 92 4 16 0 809912 38136 3095 0 0 0 2126 0 89 0 580 13160 1319 5 16 79 2 16 0 810028 59412 3107 0 0 0 3472 5560 109 0 521 11596 799 3 16 81 systat -v: Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 424440 7376 793348 16896 83652 count All 998604 11808 2012876 28536 pages 715 zfod Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt 478 cow 486 tota 12 64 757 1792 7409 486 319 1702 185992 wire ata0 irq14 509556 act 48 mux irq2 7.5%Sys 0.5%Intr 7.9%User 0.0%Nice 84.2%Idl 248948 inact ahc1 irq5 | | | | | | | | | | 54108 cache 210 xl0 irq9 ====>>>> 29544 free fdc0 irq6 daefr atkbd0 irqNamei Name-cache Dir-cache 1049 prcfr sio0 irq4 Calls hits % hits % react 100 clk irq0 12376 11927 96 5 0 pdwak 128 rtc irq8 pdpgs Disks da0 acd0 fd0 pass0 md0 intrn KB/t 10.33 0.00 0.00 0.00 0.00 114480 buf tps 48 0 0 0 0 666 dirtybuf MB/s 0.48 0.00 0.00 0.00 0.00 70423 desiredvnodes % busy 17 0 0 0 0 81095 numvnodes 63089 freevnodes Thanks, Charles -- Charles Sprickman spork@inch.com