From owner-freebsd-stable@FreeBSD.ORG Mon Mar 1 07:24:47 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9159E16A4CE for ; Mon, 1 Mar 2004 07:24:47 -0800 (PST) Received: from mail.solo.net (mail.solo.net [216.133.69.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17A4143D2D for ; Mon, 1 Mar 2004 07:24:47 -0800 (PST) (envelope-from dak@solo.net) Received: from solo.net (mail.solo.net [216.133.69.102]) (authenticated bits=0) by mail.solo.net (8.12.10/8.12.10) with ESMTP id i21FOLaF032265 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 1 Mar 2004 10:24:24 -0500 (EST) (envelope-from dak@solo.net) Message-ID: <40435588.6010604@solo.net> Date: Mon, 01 Mar 2004 10:23:52 -0500 From: "David A. Koran" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Evren Yurtesen References: <40434197.8060100@solo.net> <40434F23.7070608@ispro.net.tr> In-Reply-To: <40434F23.7070608@ispro.net.tr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: pass (mail.solo.net: domain of dak@solo.net designates 216.133.69.102 as permitted sender) X-SoloNet-MailScanner-Information: Please contact postmaster@solo.net for more information. X-SoloNet-MailScanner: Clean X-SoloNet-MailScanner-SpamCheck: not spam, SpamAssassin (score=-3.725, required 5, autolearn=not spam, AWL 1.18, BAYES_00 -4.90) cc: freebsd-stable@freebsd.org Subject: Re: Same Panic 12 on differnet servers X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Mar 2004 15:24:47 -0000 See comments wherein... (it's to note, that I'm glad somebody is willing to work with me on this, I'll do my best to comply with some of the requests for information) Evren Yurtesen wrote: > Hi, > > What does the panic message say? Unfortunately, I'm describing the results other than the panic type. The events that were describved in the original thread mirror what I have, however, I'm remtoe an can't pick up the consoel messages (if somebody's got a cool trick for that, I'd appreciate it). I'm usually recording that it has rebooted after a panic. Originally I was having more issues with processes not clearing buffers (I can find the thread that proposed the fix, but I remember it realted to threaded servers like Apache and MySQL) > > > Did you do cvsup and world before the crashes started? or after? I cvsup about once or twice a day. I build world about once a week and upgrade ports daily. I'm going through a portsupgrade rihgt now and will build world again this afternoon. Since I don't have an exact date, and only anecdotal evidence to support this behaviour at a given time, but I can belive that this at least started after mid-January and runs until now. > > When was the last cvsup you made which worked stable? 80 days ago? see above. I think I may be able to go back to a January 15th tree and see. I think the last "panic" build I had (means just cvsuped for major fixes and built without watching the build process) was shortly after the shmat (http://lists.freebsd.org/pipermail/freebsd-security-notifications/2004-February/000022.html) alert. I will check my kernel config to see if we may have a Sys V issue possibly. > > You know, you can do some testing and go back to the same day you made > cvsup last time when it was stable and see if the problem persists. If > you have multiple machines then you can set them with 10 day > difference and see which ones will crash and which wont. Then close > the gap and find the day when the code which is causing this has been > committed and eventually find the reason. There is no easy way to tell > what is your problem with the information you have sent. Well this > method would work if the problem is a software bug. Did you consider > that there might be some hardware problems which showed themselves > after a reboot after 80 days? However improbable, it is a possibility. The hardware is fine and has been working without a hitch. And, for the case that I'm not sure EXACTLY when the last stable build ocurred (i can look at my saved daily logs for repeated reboots), I'm not going to have much to go on right now. I was mor eor less soliciting any me-toos to see if we can pin-point the issue. I'll post back on the progress of finding out when this ocurred (or started to at least). > > > Which process is using the cpu so much before crashing? This is post crash diagnostics, so, I'm not process monitoring yet. > > > Did you recompile and updated other binaries in your system which > doesnt come with the default freebsd distribution? I have a ton of apps on the machine (it's a loaded webserver and mail server, most of the laod comes from SPAM and Virus scanning of incomign e-mail right now).. so pin-pointing the offending app right now will probably take more work. > > > How many different servers are you getting this panic on? do they have > the same hardware? Just this one (my backup test box [read: laptop] is out for hardware maintenance... FreeBSD 5.x kept dying on it... urf!) > > > Evren > > David A. Koran wrote: > >> I'm getting the same type of errors for a box that's been keeping >> current on 4.9 (and the 4.x tree) for the past two years. The box had >> been relatively stable up and until late December, when in January and >> all through February the box has been rebooting on a regular basis. >> >> This is a dual-proc box with 256 MB of ram. I'm running a pretty >> balanced combination of web and mail server on it. The load used to >> (and with some tuning) stays below 1.00 load, but I've seen it get to >> above 3.00 and start crashing. I had it at 80.00+ before without it >> dying before, so I'm betting there's some code instability. >> >> I'd be willing to work with any developer on the list to test code to >> get this condition mentioned here in the thread solved. >> >> (On a side note, to inspire some quicker work, we host Howard Shore's >> website, the one who won the music Oscars last night on Lord of the >> Rings, and I would be grateful for any help to keep the site stable) >> >> P.S. - the unmounted filesystem error below is after one of the crash >> reboots. >> >> >> mail# uname -a >> FreeBSD mail.solo.net 4.9-STABLE FreeBSD 4.9-STABLE #20: Sat Feb 21 >> 12:03:07 EST 2004 root@mail.solo.net:/usr/obj/usr/src/sys/SOLONET >> i386 >> mail# dmesg >> Copyright (c) 1992-2003 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, >> 1994 >> The Regents of the University of California. All rights >> reserved. >> FreeBSD 4.9-STABLE #20: Sat Feb 21 12:03:07 EST 2004 >> root@mail.solo.net:/usr/obj/usr/src/sys/SOLONET >> Timecounter "i8254" frequency 1193182 Hz >> CPU: Pentium II/Pentium II Xeon/Celeron (350.80-MHz 686-class CPU) >> Origin = "GenuineIntel" Id = 0x652 Stepping = 2 >> Features=0x183fbff >> >> real memory = 268435456 (262144K bytes) >> avail memory = 257265664 (251236K bytes) >> Programming 24 pins in IOAPIC #0 >> IOAPIC #0 intpin 2 -> irq 0 >> IOAPIC #0 intpin 16 -> irq 11 >> IOAPIC #0 intpin 18 -> irq 9 >> FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs >> cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 >> cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 >> io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 >> Preloaded elf kernel "kernel" at 0xc03bc000. >> Pentium Pro MTRR support enabled >> md0: Malloc disk >> npx0: on motherboard >> npx0: INT 16 interface >> pcib0: on motherboard >> pci0: on pcib0 >> pcib1: at device 1.0 on >> pci0 >> pci1: on pcib1 >> pci1: at 0.0 irq 11 >> isab0: at device 7.0 on pci0 >> isa0: on isab0 >> atapci0: port 0xffa0-0xffaf at device >> 7.1 on pci0 >> ata0: at 0x1f0 irq 14 on atapci0 >> ata1: at 0x170 irq 15 on atapci0 >> uhci0: at device 7.2 on pci0 >> uhci0: Could not map ports >> device_probe_and_attach: uhci0 attach returned 6 >> Timecounter "PIIX" frequency 3579545 Hz >> chip1: port 0x440-0x44f at >> device 7.3 on pci0 >> pcib2: at device 16.0 on pci0 >> pci2: on pcib2 >> vx0: <3COM 3C590 Etherlink III PCI> port 0xdf80-0xdf9f irq 9 at device >> 6.0 on pci2 >> utp[*utp*] address 00:a0:24:92:d2:d0 >> vx0: driver is using old-style compatibility shims >> ahc0: port 0xe400-0xe4ff mem >> 0xffafe000-0xffafefff irq 11 at device 18.0 on pci0 >> aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/253 SCBs >> ahc1: port 0xe800-0xe8ff mem >> 0xffaff000-0xffafffff irq 11 at device 18.1 on pci0 >> aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/253 SCBs >> orm0: