From owner-freebsd-database@FreeBSD.ORG  Tue Apr  2 22:35:42 2013
Return-Path: <owner-freebsd-database@FreeBSD.ORG>
Delivered-To: freebsd-database@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 986929D6
 for <freebsd-database@freebsd.org>; Tue,  2 Apr 2013 22:35:42 +0000 (UTC)
 (envelope-from david.i.noel@gmail.com)
Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com
 [209.85.220.51]) by mx1.freebsd.org (Postfix) with ESMTP id 79F49F9A
 for <freebsd-database@freebsd.org>; Tue,  2 Apr 2013 22:35:42 +0000 (UTC)
Received: by mail-pa0-f51.google.com with SMTP id jh10so539180pab.10
 for <freebsd-database@freebsd.org>; Tue, 02 Apr 2013 15:35:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:reply-to:date:message-id:subject:from:to
 :content-type; bh=idFiAYF4FzCxusOXvecb6ILRJ5bSvtgPjD2RtnF3y7s=;
 b=u4gF7QxhlqmsteEAWMFRr0r2qzcBk+NY9YuDxA4QXsxyRba5jcY3CGK5GnimQEZZVF
 oUhOw0wsV+D1pdlidygspXSZULtdDq5IXCAvsBHo2/747M3993kWhvqqFtmXLmOQrfIj
 89lwAN3J4oZfIsrhi2mrwNtpO4aoUDOOhrnSamN0B7p3gsEoL2aw2nwF75RqUYrPXl4p
 qb/FUC6pErFPnn1NtXG36Ur1Q1cM8+ZeXmgziTbC5IikmqCzEc3BVulRJJDxdVBf4Az8
 6QYH01JyZwmoJt3zmiwu7XeX4bTEFTVrP63O0DFE/Ya0ygYrPiaSlTvC+7bm3MvH30U/
 zTUw==
MIME-Version: 1.0
X-Received: by 10.68.64.136 with SMTP id o8mr27004796pbs.59.1364942135898;
 Tue, 02 Apr 2013 15:35:35 -0700 (PDT)
Received: by 10.68.90.100 with HTTP; Tue, 2 Apr 2013 15:35:35 -0700 (PDT)
Date: Tue, 2 Apr 2013 17:35:35 -0500
Message-ID: <CAHAXwYDg7ijxaG8K8vHMQci552txe94U4KKKJ4cZbUHT5UGXcQ@mail.gmail.com>
Subject: PostgreSQL: CPU utilization creeping to 100%
From: David Noel <david.i.noel@gmail.com>
To: PGSQL Mailing List <pgsql-general@postgresql.org>,
 freebsd-database@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-database@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: David.I.Noel@gmail.com
List-Id: Database use and development under FreeBSD
 <freebsd-database.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-database>, 
 <mailto:freebsd-database-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-database>
List-Post: <mailto:freebsd-database@freebsd.org>
List-Help: <mailto:freebsd-database-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-database>, 
 <mailto:freebsd-database-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Apr 2013 22:35:42 -0000

I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.

I'm not sure what information here is relevant, so I'll give
everything I can as concisely as I can.

The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.

There are 2 databases, one a queue, and one containing tables storing
crawled data.

The application is a webcrawler. The application pulls URLs from the
queue and marks them active in a single transaction. It then feeds the
URLs to the crawler threads who crawl the URL, populate the tables
with data, and signal the main thread to update the queue database,
marking the item as inactive and rescheduling it with a new
"NextCrawlDate".

The processes that hang are the postgres processes that interact with
the queue database.

'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.

Essentially, the query in question is responsible for returning 1 URL
from the union of the list of URL's whose crawl session has timed out
and the list of URL's next in line to be crawled according to the
schedule (NextCrawlDate). The query is: "select * from ((select * from
"crawlq" where "Active" = 'true' AND "TimeoutDate" <= now()) UNION
(select * from "crawlq" where "Active" = 'false')) as RS order by
"NextCrawlDate" asc limit 1"

Beyond this I don't know what useful debugging information to include.
I'll take a guess and start with some vmstat output.

Under normal conditions (with the crawler running) vmstat shows the following:

 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy
cs us sy id
 2 0 0    119G  8450M  1143   0   1   0   900   0   0   0  128 6700
8632 32  4 65
 2 0 0    119G  8444M  1937   0   0   0   100   0   4   4  280 112206
7683 36  5 59
 1 0 0    119G  8443M   427   0   0   0  1377   0  90  90  222 115889
9020 35  7 58
 1 0 0    119G  8442M  1798   0   0   0    18   0   1   1  153 104954
7071 30  4 66
 3 0 0    119G  8443M   528   0   0   0   681   0  10  10  293 125170
14523 40 15 46
 1 0 0    119G  8432M 15227   0   0   4  2850   0   3   3  205 83830
6663 55 12 33
 8 0 0    119G  8433M  3306   0   0   0   445   0   2   2  249 113792
7068 34  5 61
10 0 0    118G  8461M  1190   0   0   0  9909   0  72  73  451 120263
10036 56  9 35
14 0 0    118G  8456M  5887   0   0   0  1202   0   2   2  272 130795
9506 44 12 44
 9 0 0    118G  8444M  7750   0   0   0  1070   0   9   9  298 87643
9584 80 13  7
 3 0 0    118G  8442M  1335   0   0   0   648   0   5   5  189 143691
9234 36  6 58
 1 0 0    118G  8442M   689   0   1   1   472   0   2   2  206 153868
8635 32  7 61
 1 0 0    118G  8441M   203   0   0   0  1124   0  75  75  191 142598
8909 31 10 60
 2 0 0    118G  8440M  9508   0   0   0   684   0   8   8  231 132785
10247 47 13 41
 4 0 0    118G  8456M  4046   0   0   0  5469   0  11  11  299 143119
12475 54 22 24
 4 0 0    117G  8490M  1076   0   0   0  9858   0  16  16  291 140701
14849 58 25 17
 1 0 0    116G  8524M   344   0   0   0  8936   0   4   4  234 149103
12137 45 15 40
 2 0 0    114G  8586M   715   0   0   5 17719   0  73  75  322 151002
11430 34 10 56
 5 0 0    112G  8648M  2773   0   0   0 16997   0   6   6  225 118339
8700 30 10 61
 1 0 0    110G  8705M  4429   0   0   0 15763   0   7   7  423 139590
10354 40 11 49
 1 0 0    108G  8760M  1443   0   0   0 14519   0   7   7  405 139806
10214 37  5 58
 1 0 0    104G  8863M   333   0   0   0 26537   0   5   5  284 107770
9947 34  6 60
 1 0 0    104G  8859M  1331   0   0   0  1700   0 114 114  464 103248
12113 40  9 51
 1 0 0    104G  8854M  1708   0   0   0   272   0   6   6  279 99817
9470 40  5 55
 9 0 0    104G  8850M  3653   0   0   0  4809   0  28  28  346 160041
54071 42 32 26
12 3 0    105G  8845M 20576   0   0   0 18344   0   7   7  383 95019
32533 46 53  1
20 0 0    114G  8721M 46913   0   0   0  2941   0  11  11  461 77480
9794 72 28  0
12 1 0    110G  8759M 25109   0   0   0 35881   0  70  70  413 72631
10161 76 24  0
 2 0 0    110G  8716M 12993   0   1   1   265   0   8   8  292 83085
10073 61 30  9
 3 0 0    110G  8716M  2144   0   0   0    45   0   3   3  183 100994
7410 39 20 41


...and when postgres goes bonkers:

 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy
cs us sy id
 2 0 0     98G    10G  1091   0   1   0   849   0   0   0  114 2641
8582 30  4 66
 2 0 0     98G    10G    20   0   0   0     0   0   0   0  197 20500
10454 46  2 53
 2 0 0     98G    10G    59   0   0   0     0   0   0   0  284 23715
11180 46  3 51
 3 0 0     98G    10G    17   0   0   0   652   0  69  70  288 21968
11571 46  4 50
 2 0 0     98G    10G    56   0   0   0     0   0   0   0  242 22120
10251 47  2 50
 2 0 0     98G    10G    16   0   0   0     0   0   0   0  207 20750
9982 48  1 51
 4 0 0     98G    10G    61   0   0   0    97   0   0   0  205 21024
10395 49  2 50
 2 0 0     98G    10G    24   0   0   0     0   0   0   0  215 21167
10378 48  1 50
 4 0 0     98G    10G    57   0   0   0   346   0  46  46  272 22766
11314 47  3 50
 2 0 0     98G    10G    19   0   0   0     0   0   0   0  205 20594
10340 48  1 50
 2 0 0     98G    10G    53   0   0   0     0   0   0   0  250 22282
10526 48  2 50
 2 0 0     98G    10G    27   0   0   0     0   0   0   0  260 27824
12090 51  4 44
 2 0 0     98G    10G   543   0   0   0   429   0   0   0  267 28757
11817 46  4 50
 2 0 0     98G    10G    23   0   0   0   260   0  44  44  278 33147
12624 51  5 43
 2 0 0     98G    10G    55   0   0   0     0   0   0   0  243 26885
11081 47  3 50
 2 0 0     98G    10G    16   0   0   0     0   0   0   0  207 20854
10348 48  1 51
 2 0 0     98G    10G    55   0   0   0     9   0   1   1  222 22714
10766 48  1 51
 2 0 0     98G    10G    23   0   0   0     0   0   0   0  200 20919
10392 47  2 51
 2 0 0     98G    10G    55   0   0   0   517   0  88  87  247 20874
11032 46  3 51
 4 0 0     98G    10G    20   0   0   0     0   0   0   0  253 30263
11705 50  3 47
 2 0 0     98G    10G  1133   0   0   0     4   0   0   0  212 26791
11186 51  2 47
 2 0 0     98G    10G    19   0   0   0     4   0   0   0  202 22043
10752 48  1 51
 2 0 0     98G    10G    59   0   0   0     0   0   0   0  208 21416
10506 48  1 51
 4 0 0     98G    10G    18   0   0   0     4   0   0   0  214 22207
10598 48  2 50
 1 0 0     98G    10G    57   0   0   0     9   0   0   0  203 21102
10245 50  1 49
 2 0 0     98G    10G    25   0   0   0     2   0   0   0  206 21531
10225 47  3 50
 2 0 0     98G    10G    53   0   0   0     0   0   0   0  217 21083
10519 46  1 53
 2 0 0     98G    10G    24   0   0   0    15   0   1   1  199 22009
10620 49  3 48
 2 0 0     98G    10G    55   0   0   0   138   0  44  44  233 21647
10862 48  2 50
 1 0 0     98G    10G    55   0   0   0     6   0   0   0  203 23002
10653 49  2 49

/etc/rc.conf:
postgresql_enable="YES"
postgresql_data="/zdb/pgsql/data"
postgresql_flags="-s -m smart"

/boot/loader.conf:
kern.ipc.semmns="1024"
kern.ipc.semmni="128"
kern.ipc.shmall="1048576"
kern.ipc.shmseg="2048"
kern.ipc.shmmax="2147483647"
kern.ipc.shmmni="2048"
kern.maxusers="1024"
kern.maxswzone="335544320"

postgresql.conf, all standard/default except for:
max_connections = 256

Any thoughts? What other information can I provide?

Regards,

-David