From owner-freebsd-hackers Mon May 6 9:55:38 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id D5DC337B400 for ; Mon, 6 May 2002 09:55:29 -0700 (PDT) Received: from pool0013.cvx22-bradley.dialup.earthlink.net ([209.179.198.13] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #2) id 174llo-0001vi-00; Mon, 06 May 2002 09:55:28 -0700 Message-ID: <3CD6B563.ECF6A475@mindspring.com> Date: Mon, 06 May 2002 09:54:59 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Patrick Thomas Cc: freebsd-hackers@freebsd.org Subject: Re: what causes a userland to stop, but allows kernel to continue? References: <20020506080159.K86733-100000@utility.clubscholarship.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Patrick Thomas wrote: > > No denied requests. It's not mbufs. It must be something else. > > How do you feel about this: [ ... ] You have 24M in vnodes, which is surprising for a machine whose job is supposedly postgres. You have another 17M in PV ENTRY values, which is for page mapping. You have 81M in swap metadata; 12M in VM OBJECTS. You don't tell us when you took this sample, relative to the crash time... right after the start? Right before the crash? Do you restart postgres? Does it fork for each client conection? Also, not all memory is accounted to zones, which is why I suggested "vmstat -m", *NOT* "vmstat -z". > anything interesting ? You claim really small numbers for the shared memory segments, but then in another message, you say you are running multiple instances of postgres in jails. We don't have totals on these numbers. You set the physmap tunable that Alfred said would help *unless you run out of memory* ...and are maybe hitting that wall. You aren't telling us the output of "ps -gaxl" at the time of the crash (which is only interesting for the top VSZ/RSS numbers, the WCHAN's, the STAT, and the commands for the large VSZ/RSS). THis really isn't going to be interesting or useful data until you can show us trends. The way to show us trends is to capture the information at fixed intervals (e.g. with a cron job), so that it's there from start to lockup. You should calculate the lockup interval, and pick an update interval based on that. I'm personally not going to look at that amount of data unless you use gnuplot or Excel or some other tool to graph it, so that we can see time on one axis and resource consumption on the other. So don't post it directly to the list. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message