From owner-freebsd-stable@FreeBSD.ORG Wed Apr 14 18:20:29 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AE92716A4CE for ; Wed, 14 Apr 2004 18:20:29 -0700 (PDT) Received: from smtp2.mc.surewest.net (smtp2.mc.surewest.net [66.60.130.51]) by mx1.FreeBSD.org (Postfix) with SMTP id 7215043D46 for ; Wed, 14 Apr 2004 18:20:29 -0700 (PDT) (envelope-from dislists@updegrove.net) Received: (s3-6657); Wed, 14 Apr 2004 18:20:29 -0700 Received: from unknown (HELO updegrove.net) (64.30.97.117) by smtp2.mc.surewest.net (s3-smtpd/0.90-beta3) with SMTP; Wed, 14 Apr 2004 18:20:27 -0700 Received: (qmail 9371 invoked by uid 98); 15 Apr 2004 01:20:31 -0000 Received: from dislists@updegrove.net by smeagol.purgatory by uid 1008 with qmail-scanner-1.20 Clear:RC:1(64.166.46.10):. Processed in 6.916547 secs); 15 Apr 2004 01:20:31 -0000 X-Qmail-Scanner-Mail-From: dislists@updegrove.net via smeagol.purgatory X-Qmail-Scanner: 1.20 (Clear:RC:1(64.166.46.10):. Processed in 6.916547 secs) Received: from adsl-64-166-46-10.dsl.scrm01.pacbell.net (HELO updegrove.net) (64.166.46.10) by updegrove.net with SMTP; 15 Apr 2004 01:20:22 -0000 Message-ID: <407DE32B.8040304@updegrove.net> Date: Wed, 14 Apr 2004 18:19:39 -0700 From: Rick Updegrove User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <40770C0A.3000000@updegrove.net> <407979F3.20501@freebsd.org> <407C5AED.9040709@updegrove.net> <407C76A6.5080502@users.sourceforge.net> <407CA3D6.2090803@updegrove.net> <20040414083216.A45296@server.gisp.dk> <407D466E.9060900@updegrove.net> <407DBD39.6020405@updegrove.net> <20040414232312.GA56901@xor.obsecurity.org> <407DCB29.8010109@updegrove.net> <20040415000022.GA57253@xor.obsecurity.org> In-Reply-To: <20040415000022.GA57253@xor.obsecurity.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-TST: smtp2.mc.surewest.net SNWK3 0.31-50 ip=64.30.97.117 Subject: Re: 4.9 SMP Stability? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2004 01:20:29 -0000 Kris Kennaway wrote: > First verify that Obviously, I am going to have to change one thing at a time, wait for the crash (and let the disks take the beating) or I will have no way to know what exactly is happening. So, I will start at the top and work my way down this list. > * You have an up-to-date BIOS on the system. A lot of systems have > buggy BIOSes, and this is frequently the cause of "mysterious crashes" > especially for advanced features like SMP. I am running HP 4.06.33 PL at your request I will update to 4.06.43 PL I will do this just as soon as I sent this reply, which has more questions I need answered. Besides, I need to run the new BIOS with the 4.10-BETA kernel until it crashes to eliminate the BIOS as a suspect right? > * You have not fiddled with options in the BIOS. Playing with things > like memory timing and other BIOS features can cause crashes. I have changed one setting which stopped the "locking up with no reboooting". See http://lists.freebsd.org/pipermail/freebsd-stable/2003-July/002230.html I got an off-list reply which suggested I do the following: I went into the BIOS and selected: Configuration -> PCI Slot Devices -> PCI IRQ Locking -> Routing Algorithm [Smart] Ok I changed Routing Algorithm [Smart] to [Fixed] and got a scary warning about data loss etc. but I hit Yes and saved as prompted and rebooted. > * The hardware is all in order, you don't have mismatched components > like CPUs with different steppings, etc. This may sound silly but how do I verify this? (I have attached dmesg -a at the bottom of this email in case that helps) > These three points hold *whether or not an older version of FreeBSD > works for you*, because different versions of FreeBSD interact in > different ways with the hardware, and a previously existing problem > may suddenly leap out at you when you run a different version. Sorry but to me the above paragraph is confusing. I don't agree with what I think it says. The hardware runs just fine with 4.8-STABLE so I don't think you can convince me that my hardware is the cause of this problem. > * you're not using out-of-date kernel modules, since in general they > must be rebuilt whenever you update your kernel. How do I verify this? The proceedure I follow after once doing: mkdir /root/kernels cp GENERIC /root/kernels/MYKERNEL is: cp -Rp /etc /etc.old cd /usr rm -rf src/* rm -rf obj/* cd /usr/src /usr/local/bin/cvsup -g -L 2 /etc/stable-supfile cd /usr/src/sys/i386/conf ln -s /root/kernels/MYKERNEL /usr/sbin/config MYKERNEL cd ../../compile/MYKERNEL make depend cd /usr/src make -j4 buildworld cd /usr/src make buildkernel KERNCONF=MYKERNEL make installkernel KERNCONF=MYKERNEL make installworld cd /dev /bin/sh MAKEDEV all cd /usr/src/release/sysinstall make all install shutdown -r now Am I missing anything specific? If you just point me to the handbook I will refer back to my question: "Am I missing anything specific?" > You said the machine panicked. I said the machine reboots without any warning and without leaving anything useful in any of the logs. > When you encounter a panic, the useful > thing to do is to obtain a debugging traceback, as described in the > developers handbook. > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html > > Your bug report will be more useful the more relevant details you can > provide about it. For example, provide a copy of boot -v, and details > of what you are doing to provoke the problem, what you have tried to > work around it, and any other partial results you might have. boot -v -bash: boot: command not found Again, I am doing nothing to provoke the problem. I check the uptime frm time to time and I notice that it has rebooted. So far I have been unable to obtain any useful information by following the handbook. However, I think I have made some progress in that area. #/etc/rc.conf dumpdev=/dev/amrd0s1b savecore=YES dumpdir="/var/crash" So, hopefully when the machine crashes, after the BIOS update, along with the above changes to rc.conf and the debugging traceback (if I can obtain one) will help. > After all this, there's no guarantee that one of the volunteer > developers will be able to jump on board to try to solve your problem > straight away [1]. Debugging this kind of thing typically takes time, > so if you don't have it to spare then you'll just have to put on a > happy face and accept that you can't put in the work needed to track > newer versions of FreeBSD on your machine. Yep I know but I feel like I must try anyway. :) > Kris > > [1] of course, you always have the option to pay an expert to > investigate the problem. Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.10-BETA #0: Tue Apr 13 21:49:08 PDT 2004 root@govmail.ca.gov:/usr/obj/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (499.15-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x673 Stepping = 3 Features=0x387fbff real memory = 536870912 (524288K bytes) avail memory = 519507968 (507332K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0329000. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 14 entries at 0xc00fdee0 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #0 intpin 19 -> irq 2 IOAPIC #0 intpin 17 -> irq 16 pci0: on pcib0 isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0xfcd0-0xfcdf at device 4.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at 4.2 irq 2 Timecounter "PIIX" frequency 3579545 Hz chip1: port 0x2180-0x218f at device 4.3 on pci0 pcib1: at device 7.0 on pci0 IOAPIC #0 intpin 16 -> irq 17 pci1: on pcib1 ahc0: port 0xe800-0xe8ff mem 0xfebfe000-0xfebfefff irq 17 at device 4.0 on pci1 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs pci1: (vendor=0x1000, dev=0x000c) at 7.0 irq 18 amr0: mem 0xf0000000-0xf7ffffff irq 16 at device 7.1 on pci0 amr0: Firmware D.02.05, BIOS B.01.04, 16MB RAM pcib2: at device 8.0 on pci0 pci2: on pcib2 fxp0: port 0xdce0-0xdcff mem 0xfe900000-0xfe9fffff,0xefffe000-0xefffefff irq 16 at device 2.0 on pci2 fxp0: Ethernet address 00:90:27:b7:09:76 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: (vendor=0x103c, dev=0x10c1) at 11.0 pci0: at 13.0 orm0: