From owner-freebsd-hackers  Mon May  6  9:55:38 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP id D5DC337B400
	for <freebsd-hackers@freebsd.org>; Mon,  6 May 2002 09:55:29 -0700 (PDT)
Received: from pool0013.cvx22-bradley.dialup.earthlink.net ([209.179.198.13] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #2)
	id 174llo-0001vi-00; Mon, 06 May 2002 09:55:28 -0700
Message-ID: <3CD6B563.ECF6A475@mindspring.com>
Date: Mon, 06 May 2002 09:54:59 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Patrick Thomas <root@utility.clubscholarship.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: what causes a userland to stop, but allows kernel to continue?
References: <20020506080159.K86733-100000@utility.clubscholarship.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Patrick Thomas wrote:
> > No denied requests.  It's not mbufs.  It must be something else.
> 
> How do you feel about this:

[ ... ]

You have 24M in vnodes, which is surprising for a machine whose
job is supposedly postgres.  You have another 17M in PV ENTRY
values, which is for page mapping.  You have 81M in swap metadata;
12M in VM OBJECTS.

You don't tell us when you took this sample, relative to the crash
time... right after the start?  Right before the crash?

Do you restart postgres?  Does it fork for each client conection?

Also, not all memory is accounted to zones, which is why I suggested
"vmstat -m", *NOT* "vmstat -z".


> anything interesting ?


You claim really small numbers for the shared memory segments,
but then in another message, you say you are running multiple
instances of postgres in jails.  We don't have totals on these
numbers.

You set the physmap tunable that Alfred said would help *unless
you run out of memory* ...and are maybe hitting that wall.

You aren't telling us the output of "ps -gaxl" at the time of
the crash (which is only interesting for the top VSZ/RSS numbers,
the WCHAN's, the STAT, and the commands for the large VSZ/RSS).

THis really isn't going to be interesting or useful data until
you can show us trends.  The way to show us trends is to capture
the information at fixed intervals (e.g. with a cron job), so
that it's there from start to lockup.  You should calculate the
lockup interval, and pick an update interval based on that.

I'm personally not going to look at that amount of data unless
you use gnuplot or Excel or some other tool to graph it, so
that we can see time on one axis and resource consumption on
the other.  So don't post it directly to the list.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message