From owner-freebsd-stable@FreeBSD.ORG Mon Mar 1 08:01:39 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B2DC16A4CE for ; Mon, 1 Mar 2004 08:01:39 -0800 (PST) Received: from notes.hallinto.turkuamk.fi (notes.hallinto.turkuamk.fi [195.148.215.149]) by mx1.FreeBSD.org (Postfix) with ESMTP id 98A6B43D1F for ; Mon, 1 Mar 2004 08:01:38 -0800 (PST) (envelope-from yurtesen@ispro.net.tr) Received: from ispro.net.tr ([193.166.136.253]) by notes.hallinto.turkuamk.fi (Lotus Domino Release 5.0.10) with ESMTP id 2004030118030089:1425 ; Mon, 1 Mar 2004 18:03:00 +0200 Message-ID: <40435E69.9000301@ispro.net.tr> Date: Mon, 01 Mar 2004 18:01:45 +0200 From: Evren Yurtesen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5b) Gecko/20030808 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David A. Koran" References: <40434197.8060100@solo.net> <40434F23.7070608@ispro.net.tr> <40435588.6010604@solo.net> In-Reply-To: <40435588.6010604@solo.net> X-MIMETrack: Itemize by SMTP Server on notes.hallinto.turkuamk.fi/TAMK(Release 5.0.10 |March 22, 2002) at 01.03.2004 18:03:01,|March 22, 2002) at 01.03.2004 18:03:02, Serialize complete at 01.03.2004 18:03:02 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii; format=flowed cc: freebsd-stable@freebsd.org Subject: Re: Same Panic 12 on differnet servers X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Mar 2004 16:01:39 -0000 David A. Koran wrote: > however, I'm remtoe an can't pick up the consoel messages (if somebody's > got a cool trick for that, I'd appreciate it). I'm usually recording Well, I dont know any other way than plugging some other machine to console port via serial and then record everything. Since in a panic situation it is most probably that the machine cant write anything to logs or to disk at all. > I cvsup about once or twice a day. I build world about once a week and > upgrade ports daily. I'm going through a portsupgrade rihgt now and will But you said the machine was on for 80+ days before it crashed before? How could you upgrade the system and kernel once a week? Perhaps the working sources for you has been at least 80 days ago and your problem might be related to anything happened to freebsd sources in between? > The hardware is fine and has been working without a hitch. And, for the > case that I'm not sure EXACTLY when the last stable build ocurred (i can > look at my saved daily logs for repeated reboots), I'm not going to have > much to go on right now. I was mor eor less soliciting any me-toos to > see if we can pin-point the issue. I'll post back on the progress of > finding out when this ocurred (or started to at least). Well, I had once a problem with a machine which was working more than a month without any problem. Then it started rebooting etc. after I made world. I thought its a software problem but later on I realized that 2 of the memory modules were faulty. I would guess they just got bad about the same time when I cvsupped to newer sources. We shouldnt exclude the possibility of some hardware failure which cause a program to malfunction. When you are around the hardware you should perhaps try some memory test and hard drive tests. www.memtest86.com has a nice utility and I guess your drives would support S.M.A.R.T. testing. Its a shame that 4.x versions of freebsd cant work with smartmontools with ATA drives, otherwise you could do the test on the fly. I am sure there is an utility which can do SMART tests with a bootable floppy etc. though. I never used one. Hmm lets see ;) *googling* Well IBM/Hitachi seems to have a software. I dont know if it will work on your WD drives but its worth a shot. http://www.hgst.com/hdd/support/download.htm Even if your drives are not the problem, looking the drive status doesnt hurt anything. I recently realized few of my drives are gonna fail pretty soon. It is just nice to know before that really happens. You should do the extended long smart test on the drive. It is the best test which can detect any possible failures. >> Which process is using the cpu so much before crashing? > > This is post crash diagnostics, so, I'm not process monitoring yet. > >>> balanced combination of web and mail server on it. The load used to >>> (and with some tuning) stays below 1.00 load, but I've seen it get to >>> above 3.00 and start crashing. Well I just thought you would know, because you said the load gets up to 3.00 before crashing... So you didnt check what was using so much CPU? > I have a ton of apps on the machine (it's a loaded webserver and mail > server, most of the laod comes from SPAM and Virus scanning of incomign > e-mail right now).. so pin-pointing the offending app right now will > probably take more work. Well, the problem might be your spam/virus scanning software also. Nowadays there are so many mail worms that when they start attacking, you would receive hundreds of emails at once. That might cause you to run out of memory etc. and use a lot of processor power and swap space also! The access to the machine would get really slow. Then it might eventually cause a crash/reboot situation...This is just another possibility. > Just this one (my backup test box [read: laptop] is out for hardware > maintenance... FreeBSD 5.x kept dying on it... urf!) Well the subject said 'on different servers' so I thought you have multiple servers having the same issue. Evren