From owner-freebsd-hackers Tue Sep 14 16:41:57 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from bomber.avantgo.com (ws1.avantgo.com [207.214.200.194]) by hub.freebsd.org (Postfix) with ESMTP id A187F14D51 for ; Tue, 14 Sep 1999 16:41:50 -0700 (PDT) (envelope-from sa-list@avantgo.com) Received: from avantgo.com ([10.0.128.109]) by bomber.avantgo.com (Netscape Messaging Server 3.5) with ESMTP id 306 for ; Tue, 14 Sep 1999 16:36:59 -0700 Message-ID: <37DEDD59.6FFDFEAD@avantgo.com> Date: Tue, 14 Sep 1999 16:42:17 -0700 From: Stevan Arychuk Reply-To: sa-list@avantgo.com Organization: AvantGo Inc. X-Mailer: Mozilla 4.6 [en] (X11; I; FreeBSD 3.2-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: hackers@freebsd.org Subject: help with flaky reboot on 3.1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Greetings, We are running 3.1-RELEASE with a kernel pulled on May 1, 1999 from the RELENG_3 branch (used this to take advantage of the KVA modifications that were rolled in after the release). The machines are dual PII 450's (N440BX) with 512MB RAM. We are also using built in ethernet and SCSI controllers. Our kernel configuration is fairly standard with the following exceptions: maxusers 512 options NMBCLUSTERS=33280 options SMP options APIC_IO options "VM_KMEM_SIZE=(128*1024*1024)" options "VM_KMEM_SIZE_MAX=(128*1024*1024)" Here are the symptoms we are seeing: 1 machine running a caching squid reverse proxy would spontaneously reboot with no error messages every week or so. This machine was a single CPU only. We were seeing an excessive number of sockets in the CLOSING state, via netstat. The reboots seemed to be co-related to having many such sockets. Suspecting bad TCP stack on the Internet, we did 'sysctl -w net.inet.tcp.always_keepalive=1' This fixed the many CLOSING sockets problem, but did not fix the reboots. Other machines running custom software (Dual CPU) would also spontaneously reboot also with no error messages. The reboots are happening on an increasing frequency, almost to the point of a couple times a day. Sometimes a machine would reboot a couple times a day, then be ok for another week or so. Our software excercies the disk, CPU and network quite a bit, but not excessively. The only machines that are having problems, are production machines directly connected to the Internet. We've had the same machines running internally with longer uptimes, and heavier volumes. Any suggestions/idea's? Sorry about the super-post, I thought detail was important. - Stevan Arychuk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message