From owner-freebsd-stable@FreeBSD.ORG Tue Nov 16 20:07:58 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0B27716A4CE for ; Tue, 16 Nov 2004 20:07:58 +0000 (GMT) Received: from freebee.digiware.nl (dsl439.iae.nl [212.61.63.187]) by mx1.FreeBSD.org (Postfix) with ESMTP id 30A7F43D5D for ; Tue, 16 Nov 2004 20:07:54 +0000 (GMT) (envelope-from wjw@withagen.nl) Received: from [212.61.27.71] (dual.digiware.nl [212.61.27.71]) by freebee.digiware.nl (8.12.10/8.12.10) with ESMTP id iAGK7p5p099977 for ; Tue, 16 Nov 2004 21:07:52 +0100 (CET) (envelope-from wjw@withagen.nl) Message-ID: <419A5E18.5050806@withagen.nl> Date: Tue, 16 Nov 2004 21:07:52 +0100 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Questions about a 5.2.1 crash..... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Nov 2004 20:07:58 -0000 Hi, I've been asked to help this former customer. This box is running 5.2.1 with as most "exotic" application jave/tomcat. And it crashes too often. The first essential question for the customer is: Is this hardware??? So I did the 'make -j 8 buildworld' test, which it survived with flying colors. What do I find in the logs: Nov 12 11:22:01 ktc syslogd: kernel boot file is /boot/kernel/kernel Nov 12 11:22:01 ktc kernel: panic: vm_fault: fault on nofault entry, addr: ee8fd000 Nov 12 11:22:01 ktc kernel: Nov 12 11:22:01 ktc kernel: syncing disks, buffers remaining... panic: bremfree: removing a buffer not on a queue Nov 12 11:22:01 ktc kernel: Uptime: 2d11h12m6s Nov 12 11:22:01 ktc kernel: Shutting down ACPI Nov 12 11:22:01 ktc kernel: Automatic reboot in 15 seconds - press a key on the console to abort ...... Nov 12 11:22:02 ktc kernel: WARNING: / was not properly dismounted Nov 12 11:53:54 ktc syslogd: kernel boot file is /boot/kernel/kernel Nov 12 11:53:54 ktc kernel: panic: vm_fault: fault on nofault entry, addr: ee79f000 Nov 12 11:53:54 ktc kernel: Nov 12 11:53:54 ktc kernel: syncing disks, buffers remaining... panic: bremfree: removing a buffer not on a queue Nov 12 11:53:54 ktc kernel: Uptime: 31m36s Nov 12 11:53:54 ktc kernel: Shutting down ACPI Now the panic in the syncing part, I can imagine to just be there since syncing disk after a panic has not really worked all that well in 5.x Customer has disabled all warnings/witness/..... And I can not make much of the comments on sys/vm/vm_fault.c:278, which is where the fault on nofault occurs... My feeling says: upgrade to 5.3, but this being a production server running soem applications I know very little off holds me back. So is this for sure a hardware problem, or just leftovers from 5.2.1???? --WjW -------- Original Message -------- Date: Tue, 16 Nov 2004 20:57:26 +0100 (CET) From: Jan Willem To: wjw@withagen.nl 3.78.225.127:138 193.78.225.255:138 in via fxp0 TPTE at 0xbfc203e4 IS ZERO @ VA 080f9000 panic: bad pte syncing disks, buffers remaining... panic: bremfree: removing a buffer not on a queue Uptime: 6h59m22s Shutting down ACPI Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.2.1-RELEASE-p9 #4: Tue Aug 24 01:16:22 CEST 2004 joao@ktc.netmaniacs.nl:/usr/src/sys/i386/compile/KTC Preloaded elf kernel "/boot/kernel/kernel" at 0xc09ee000. Preloaded elf module "/boot/kernel/aout.ko" at 0xc09ee26c. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc09ee318. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3192.22-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 2129850368 (2031 MB) avail memory = 2063503360 (1967 MB) Pentium Pro MTRR support enabled npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard pcibios: BIOS version 2.10 Using $PIR table, 12 entries at 0xc00f3d20 acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 acpi_cpu0: on acpi0 acpi_cpu1: on acpi0 device_probe_and_attach: acpi_cpu1 attach returned 6 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib0: slot 2 INTA is routed to irq 11 pcib0: slot 29 INTA is routed to irq 11 pcib0: slot 29 INTB is routed to irq 5 pcib0: slot 29 INTC is routed to irq 10 pcib0: slot 29 INTA is routed to irq 11 pcib0: slot 29 INTD is routed to irq 9 pcib0: slot 31 INTA is routed to irq 10 pcib0: slot 31 INTB is routed to irq 3 agp0: port 0xec00-0xec07 mem 0xffa80000-0xffafffff,0xf0000000-0xf7ffffff irq 11 at device 2.0 on pci0 agp0: detected 16252k stolen memory agp0: aperture size is 128M uhci0: port 0xc800-0xc81f irq 11 at device 29.0 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xcc00-0xcc1f irq 5 at device 29.1 on pci0 usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd000-0xd01f irq 10 at device 29.2 on pci0 usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0xd400-0xd41f irq 11 at device 29.3 on pci0 usb3: on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: at device 29.7 (no driver attached) pcib1: at device 30.0 on pci0 pci1: on pcib1 pcib1: slot 0 INTA is routed to irq 10 pcib1: slot 1 INTA is routed to irq 10 pcib1: slot 8 INTA is routed to irq 11 pcib2: at device 0.0 on pci1 pci2: on pcib2 asr0: mem 0xe4000000-0xe5ffffff irq 10 at device 0.1 on pci1 asr0: major=154 asr0: ADAPTEC 2100S FW Rev. 370F, 1 channel, 256 CCBs, Protocol I2O xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xbc00-0xbc7f mem 0xff8efc00-0xff8efc7f irq 10 at device 1.0 on pci1 xl0: Ethernet address: 00:04:76:f6:e1:1c miibus0: on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: port 0xb800-0xb83f mem 0xff8ee000-0xff8eefff irq 11 at device 8.0 on pci1 fxp0: Ethernet address 00:0c:f1:91:52:7b miibus1: on fxp0 inphy0: on miibus1 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0xffa0-0xffaf,0-0x3,0-0x7,0-0x3,0-0x7 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci0 ata1: [MPSAFE] atapci1: port 0xd800-0xd80f,0xdc00-0xdc03,0xe000-0xe007,0xe400-0xe403,0xe800-0xe807 irq 10 at device 31.2 on pci0 atapci1: [MPSAFE] ata2: at 0xe800 on atapci1 ata2: [MPSAFE] ata3: at 0xe000 on atapci1 ata3: [MPSAFE] pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 fdc0: port 0x3f7,0x3f4-0x3f5,0x3f2-0x3f3,0x3f0-0x3f1 irq 6 drq 2 on acpi0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A ppc0 port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 acpi_cpu1: on acpi0 device_probe_and_attach: acpi_cpu1 attach returned 6 orm0: