From owner-freebsd-stable@FreeBSD.ORG  Tue Nov  4 12:24:32 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E799E106568C
	for <freebsd-stable@freebsd.org>; Tue,  4 Nov 2008 12:24:32 +0000 (UTC)
	(envelope-from kjedruczyk@ramfasto.com)
Received: from out5.smtp.messagingengine.com (out5.smtp.messagingengine.com
	[66.111.4.29]) by mx1.freebsd.org (Postfix) with ESMTP id AE76D8FC40
	for <freebsd-stable@freebsd.org>; Tue,  4 Nov 2008 12:24:32 +0000 (UTC)
	(envelope-from kjedruczyk@ramfasto.com)
Received: from compute1.internal (compute1.internal [10.202.2.41])
	by out1.messagingengine.com (Postfix) with ESMTP id E29341A0274
	for <freebsd-stable@freebsd.org>; Tue,  4 Nov 2008 07:10:50 -0500 (EST)
Received: from heartbeat1.messagingengine.com ([10.202.2.160])
	by compute1.internal (MEProxy); Tue, 04 Nov 2008 07:10:50 -0500
X-Sasl-enc: oeIA8TaM6Os5nfJ4qs+vOfMpyeU/iBZdN7qOuMMGIc8L 1225800650
Received: from buka.ramfasto.com (dyb186.internetdsl.tpnet.pl [83.14.53.186])
	by mail.messagingengine.com (Postfix) with ESMTPA id 26BCD13D2E;
	Tue,  4 Nov 2008 07:10:49 -0500 (EST)
Message-ID: <49103BC0.3070605@ramfasto.com>
Date: Tue, 04 Nov 2008 13:10:40 +0100
From: =?UTF-8?B?S3J6eXN6dG9mIErEmWRydWN6eWs=?= <kjedruczyk@ramfasto.com>
User-Agent: Thunderbird 2.0.0.14 (X11/20080707)
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: PostgreSQL stats collector eats all CPU time
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Nov 2008 12:24:33 -0000

Recently postgresql on our database server started showing some sort of
problems: after running for some time stats collector process eats 100%
cpu time - exactly as someone reported here:
http://groups.google.com/group/pgsql.general/browse_thread/thread/6dfea591d243e987

No solution is provided there though... kernel/libc bug is suggested

I'm not sure how relevant it is - problem appeared first time about a
day or two after server has been upgraded with additional processor: now
it is 2x dual core opteron with 8GB of RAM. For some reason we didn't
see this problem back when it was just one dual core opteron with 4GB of
RAM. It is amd64 version of freebsd of course...

As the person who reported the problem previously on postgresql mailing
list showed - the stats collector busy-loops in interrupted poll call -
kdump contains output like this:
    878 postgres 0.009643 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009671 RET   poll -1 errno 4 Interrupted system call
    878 postgres 0.009675 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009687 RET   poll -1 errno 4 Interrupted system call
    878 postgres 0.009691 CALL  poll(0x7fffffffd4e0,0x1,0x7d0)
    878 postgres 0.009700 RET   poll -1 errno 4 Interrupted system call

I also grabbed core dump of the postmaster process and the backtrace
seems a little weird to me:

#0  0x00000008012186cc in poll () from /lib/libc.so.7
[New Thread 0x801601120 (LWP 100209)]
[New LWP 54785]
(gdb) bt
#0  0x00000008012186cc in poll () from /lib/libc.so.7
#1  0x000000080107c85e in poll () from /lib/libthr.so.3
#2  0x0000000000578bd0 in pgstat_start ()
#3  0x000000000057d2b5 in PostmasterMain ()
#4  <signal handler called>
#5  0x0000000801268cdc in select () from /lib/libc.so.7
#6  0x000000080107c574 in select () from /lib/libthr.so.3
#7  0x000000000057aaa3 in ClosePostmasterPorts ()
#8  0x000000000057be9e in PostmasterMain ()
#9  0x00000000005358fe in main ()

If I'm reading it right the constantly interrupted poll function is
being called from the signal handler?

Any suggestions what else to do to identify the problem? It seems that 
the situation will be reproducible - after server restart it happened 
again within one day.

-- 
Best regards,
   Krzysztof Jędruczyk