From owner-freebsd-threads@FreeBSD.ORG Tue Feb 17 14:52:29 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 191A216A4CE for ; Tue, 17 Feb 2004 14:52:29 -0800 (PST) Received: from mail.asn.net (mail.asn.net [66.235.231.4]) by mx1.FreeBSD.org (Postfix) with SMTP id F3E5A43D1F for ; Tue, 17 Feb 2004 14:52:28 -0800 (PST) (envelope-from kris-fbsd@asn.net) Received: (qmail 42504 invoked by uid 80); 17 Feb 2004 22:52:28 -0000 Received: from 68.106.19.246 (SquirrelMail authenticated user kgale) by mail.asn.net with HTTP; Tue, 17 Feb 2004 15:52:28 -0700 (MST) Message-ID: <56666.68.106.19.246.1077058348.squirrel@mail.asn.net> Date: Tue, 17 Feb 2004 15:52:28 -0700 (MST) From: "Kris Gale" To: freebsd-threads@freebsd.org User-Agent: SquirrelMail/1.4.2 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 Importance: Normal Subject: More on MySQL -- Fatal trap 12 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Feb 2004 22:52:29 -0000 Hey Everyone, I've been trying to create a simple program to simulate the load my production environment puts on MySQL. I'm not sure if I'm creating exactly the same problems as I was seeing in production (and that I've described on this list), but I have found some pretty interesting things. What I seem to be seeing is a bogging down of MySQL when new threads are being created in bursts. This causes MySQL to temporarily become unresponsive, and will sometimes crash the whole system. Here's what my test program is doing: - Fork X number of child processes, each opening Y number of connections to the database. - Each child process loops through the Y connections it has open, executing one select statement for each, then starting over from the first. - If a particular database connection drops, it will enter a loop attempting to reconnect, forever. When using 45 child processes and 20 connections for each, everything is fine. (900 threads) If I bump it up to 90 children and 20 connections, I start to see problems. The database is unable to serve the incoming connections fast enough, and existing connections become slow or entirely unresponsive. However, if I leave it alone, eventually things "catch up."* That is, as the database server slowly manages to create new threads, all of the incoming connect requests eventually succeed (remember, they're looping). Once everything is reconnected, I see 1800 threads in MySQL, and the same query/second rate that I saw with 900 threads. * Okay, not always. About half of the time, once MySQL falls behind the incoming connections, and connect attempts start to fail, the system will crash with a "fatal trap 12: page fault while in kernel mode" In the X=90, Y=20 scenario (1800 threads), if the test is allowed to continue until everything catches up (about 5-10 minutes with KSE), I can stop and start the test, triggering the burst of connection attempts, but I see only a handful of connect errors. However, if I stop and start mysql, I'll see the 10 minutes of connect errors again. This seems to imply that somehow these threads are being cached, or something is happening that allows us to skip whatever bottleneck was causing things to bog down. Does this look like a fixable problem with KSE to anyone on this list? Let me know if you'd like a copy of the perl script I've written to try out all of these things. Kris Gale