From owner-freebsd-bugs Mon Nov 26 8:30: 7 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 2AD5537B419 for ; Mon, 26 Nov 2001 08:30:01 -0800 (PST) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id fAQGU1p78085; Mon, 26 Nov 2001 08:30:01 -0800 (PST) (envelope-from gnats) Received: from victor.teaser.fr (victor.teaser.fr [213.91.2.241]) by hub.freebsd.org (Postfix) with ESMTP id 92C9937B419 for ; Mon, 26 Nov 2001 08:20:52 -0800 (PST) Received: by victor.teaser.fr (Postfix, from userid 1000) id 4459032601; Mon, 26 Nov 2001 17:20:51 +0100 (CET) Message-Id: <20011126162051.4459032601@victor.teaser.fr> Date: Mon, 26 Nov 2001 17:20:51 +0100 (CET) From: Laurent Wacrenier Reply-To: Laurent Wacrenier To: FreeBSD-gnats-submit@freebsd.org X-Send-Pr-Version: 3.113 Subject: bin/32295: pthread dont dequeue signals Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 32295 >Category: bin >Synopsis: pthread dont dequeue signals >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Nov 26 08:30:00 PST 2001 >Closed-Date: >Last-Modified: >Originator: Laurent Wacrenier >Release: FreeBSD 5.0-CURRENT i386 >Organization: France Teaser >Environment: FreeBSD mysql.firstcampus.fr 4.4-STABLE FreeBSD 4.4-STABLE #0: Wed Nov 7 12:07:39 CET 2001 jcmichot@mysql.firstcampus.fr:/usr/src/sys/compile/SQLCAMPUS i386 FreeBSD math.teaser.net 4.2-STABLE FreeBSD 4.2-STABLE #2: Wed Jan 17 20:19:53 CET 2001 lwa@math.teaser.net:/usr/src/sys/compile/MATH i386 >Description: On heavy load, our MySQL servers sometime take about 100% CPU but are still functionnal. I've traced the process. It's looping on the following system calls : % strace -p 261 gettimeofday({1006788477, 519930}, NULL) = 0 poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=1715, events=POLLRDNORM}, {fd=1633, events=POLLRDNORM}, {fd=1748, events=POLLRDNORM}, {fd=145, events=POLLRDNORM}, {fd=988, events=POLLRDNORM}, {fd=1734, events=POLLRDNORM}, {fd=313, events=POLLRDNORM}, (...) , 292, 7838) = 1 file descriptor 3 is checked but no treatment is done for it. according lsof : mysqld 261 mysql 3u PIPE 0xd793ff20 16384 ->0xd793fe80, cnt=10798, in=10798 The pipe is used to handle signal in the threads _thread_kern_pipe[0]. It's connected to _thread_kern_pipe[1] == 4. The number of queued bytes is increasing every few seconds. I never see him decrease. The server is in this state sin 2 days. I've ktraced the process between to pipe queue grows : 261 mysqld CALL poll(0x81eb000,0x154,0x2518) 261 mysqld PSIG SIGALRM caught handler=0x282a92fc mask=0x0 code=0x0 261 mysqld RET poll 1 261 mysqld CALL write(0x4,0x81e1e5f,0x1) 261 mysqld GIO fd 4 wrote 1 byte "\^N" 261 mysqld RET write 1 261 mysqld CALL sigreturn(0x81e1e7c) My servers on high load are in production use, so I can't run them into a debuger to pick more infos. >How-To-Repeat: Not easy to repeat. Make a lot of queies to a MySQL server, wait some weeks until its happend. >Fix: I'm not fluent with libc_r but I suspect bug in thread_kern_poll() from libc_r/uthread/uthread_kern.c or _thread_sig_handler() in uthread_sig.c. In uthread_kern.c, _thread_kern_pipe[0] is added to the pool set but signal are dequeued only when _sigq_check_reqd != 0. If no new signal occurs it can loop forever. _sigq_check_reqd is set to 1 in _thread_sig_handler() only when the signal is not blocked but a byte is wroten in the pipe on every cases. May be some race condition occurs, also. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message