From owner-freebsd-questions@FreeBSD.ORG Wed Mar 24 13:49:28 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D48116A4CE for ; Wed, 24 Mar 2004 13:49:28 -0800 (PST) Received: from watcher.puryear-it.com (unknown [69.2.39.107]) by mx1.FreeBSD.org (Postfix) with ESMTP id 181F743D54 for ; Wed, 24 Mar 2004 13:49:28 -0800 (PST) (envelope-from dap99@i-55.com) Received: from localhost (unknown [127.0.0.1]) by watcher.puryear-it.com (Postfix) with ESMTP id 777B134D66 for ; Wed, 24 Mar 2004 15:40:14 -0600 (CST) Received: from watcher.puryear-it.com ([127.0.0.1]) by localhost (watcher.puryear-it.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 48909-04 for ; Wed, 24 Mar 2004 15:40:13 -0600 (CST) Received: from yourqqh4336axf (localhost [127.0.0.1]) by watcher.puryear-it.com (Postfix) with SMTP id 0DDD034D63 for ; Wed, 24 Mar 2004 15:40:13 -0600 (CST) Message-ID: <003c01c411e9$bc7d0aa0$4b0a000a@yourqqh4336axf> From: "adp" To: Date: Wed, 24 Mar 2004 15:48:24 -0600 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2720.3000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2727.1300 X-Virus-Scanned: by amavisd-new Subject: FreeBSD 4.9 goes boom! X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2004 21:49:28 -0000 Problem: FreeBSD 4.9 load average quickly goes to high levels such as 300. System becomes unusable and HOPEFULLY reboots. In general though we have to call a tech to reboot it by hitting the power switch. Here is the setup: I have a FreeBSD 4.9 server on a P4 with 256MB of RAM. We have a IDE drive. We were using HiTech RAID-1, but it was flaky so now I'm just using a single drive with regular IDE. CPU: Intel(R) Pentium(R) 4 CPU 1500MHz (1494.47-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf07 Stepping = 7 Features=0x3febf9ff real memory = 268369920 (262080K bytes) avail memory = 257400832 (251368K bytes) Warning: Pentium 4 CPU: PSE disabled Pentium Pro MTRR support enabled atapci0: port 0xf000-0xf00f at device 31.1 on pci0 ad0: 38166MB [77545/16/63] at ata0-master UDMA33 On this server I have several jails: jail 1 : running apache and serving about 6 hits/s on average. jails 2 - 7 : running apache with just one children in general for SSL (several SSL sites, several jails -- I'm moving to a single SSL jail and using natd later) jail 8 - a ssh jail for people to manage the sites During normal loads we are okay on memory. (I am adding more.) At all times we have about 1GB of paging disk free. Normally, my 5 and 10 min loads are around 0.5 (I can watch column r in vmstat and see we usually have 0 or 1 processes waiting.) This is normal: last pid: 7924; load averages: 0.11, 0.25, 0.49 up 0+00:39:40 15:30:01 345 processes: 2 running, 342 sleeping, 1 zombie Mem: 137M Active, 27M Inact, 52M Wired, 2284K Cache, 35M Buf, 30M Free Swap: 2048M Total, 31M Used, 2017M Free, 1% Inuse PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 7914 root 30 0 2264K 1320K RUN 0:00 31.00% 1.51% top 7883 root 2 0 6600K 6016K sbwait 0:00 13.84% 1.32% perl 6660 nobody 2 0 17940K 12676K sbwait 0:01 1.07% 1.07% httpd 7930 root 29 0 1852K 924K RUN 0:00 17.00% 0.83% top 763 nobody 18 0 15004K 7144K lockf 0:02 0.15% 0.15% httpd 7828 nobody 2 0 17732K 12424K accept 0:00 0.37% 0.15% httpd 4586 nobody 2 0 17944K 12604K sbwait 0:01 0.10% 0.10% httpd 7868 nobody 2 0 16376K 10944K accept 0:00 1.03% 0.10% httpd 7910 root -6 0 1968K 1356K piperd 0:00 2.00% 0.10% perl 1461 nobody 18 0 14628K 6780K lockf 0:02 0.05% 0.05% httpd 2812 nobody 18 0 14368K 6620K lockf 0:02 0.05% 0.05% httpd 4575 nobody 2 0 17768K 12480K accept 0:01 0.05% 0.05% httpd 4593 nobody 2 0 18080K 12780K sbwait 0:05 0.00% 0.00% httpd 4422 root 2 0 16100K 10264K select 0:03 0.00% 0.00% httpd 4595 nobody 2 0 17984K 12728K sbwait 0:03 0.00% 0.00% httpd 764 nobody 18 0 14992K 7300K lockf 0:02 0.00% 0.00% httpd 4560 nobody 2 0 17944K 12684K sbwait 0:02 0.00% 0.00% httpd 4561 nobody 2 0 17944K 12672K sbwait 0:02 0.00% 0.00% httpd But when the system crashes the system load just skyrockets: last pid: 88248; load averages: 238.98, 197.07, 127.85 up 2+17:12:36 14:45:38 709 processes: 257 running, 421 sleeping, 31 zombie Mem: 143M Active, 21M Inact, 75M Wired, 7908K Cache, 35M Buf, 1844K Free Swap: 2048M Total, 488M Used, 1560M Free, 23% Inuse PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 88185 root 2 0 6504K 5736K connec 0:00 1.47% 0.93% perl 25298 nobody -18 0 13700K 1596K vmpfw 0:13 0.59% 0.39% httpd 57349 nobody -18 0 14788K 1588K spread 0:10 0.57% 0.39% httpd 18115 nobody -18 0 14224K 1604K vmpfw 0:21 0.39% 0.24% httpd 39876 root 2 0 2716K 0K RUN 10:12 0.00% 0.00% 84557 nobody 2 0 22600K 0K RUN 9:54 0.00% 0.00% 84567 nobody 2 0 22360K 0K sbwait 9:47 0.00% 0.00% 84568 nobody 2 0 22564K 0K RUN 9:47 0.00% 0.00% 84564 nobody 2 0 22680K 0K sbwait 9:41 0.00% 0.00% 84556 nobody -22 0 21092K 580K swread 9:39 0.00% 0.00% httpd 84554 nobody 2 0 22592K 0K RUN 9:32 0.00% 0.00% 84555 nobody 2 0 22608K 0K RUN 9:31 0.00% 0.00% 84558 nobody 2 0 22580K 0K RUN 9:22 0.00% 0.00% 84563 nobody 2 0 22692K 0K RUN 9:07 0.00% 0.00% 84560 nobody 2 0 22580K 0K RUN 8:56 0.00% 0.00% 84398 root 2 0 21052K 1604K select 4:14 0.00% 0.00% httpd 94 root 2 0 360K 0K nfsd 3:03 0.00% 0.00% 3730 nobody 18 0 14888K 0K lockf 1:23 0.00% 0.00% Since I have 75M wired I have SOME memory available to my system. I am using bsdsar. Our system crashed around 2:45 today: Time ad0 ad1 ad2 ad3 da0 da1 da2 da3 da4 da5 da6 13:40 0 14:00 33 14:20 146 15:00 40 Time % User % Sys % Nice % Intrpt % Idle 13:40 1 2 0 2 96 14:00 11 2 0 0 87 14:20 0 12 0 0 88 15:00 10 6 0 0 84 Time Free Mem Active Mem Inactive Mem Total Swap Used Swap Free Swap 13:40 11M 129M 33M 2097024k 162608k 1934416k 14:00 5936K 149M 14M 2097024k 159464k 1937560k 14:20 904K 144M 24M 2097024k 303504k 1793520k 15:00 656K 163M 19M 2097024k 9544k 2087480k I looked in /var/log/messages and saw nothing. I do have a lot of these: Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc651b57c Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc650a524 Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc651b57c It seems to come in spurts of once or twice an hour. Any ideas?