From owner-freebsd-questions Tue Feb 20 16:49: 6 2001 Delivered-To: freebsd-questions@freebsd.org Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by hub.freebsd.org (Postfix) with ESMTP id 6150A37B491 for ; Tue, 20 Feb 2001 16:48:56 -0800 (PST) (envelope-from scott@vicor-nb.com) Received: from muir.vicor-nb.com (muir.vicor-nb.com [208.206.78.49]) by mail.vicor-nb.com (Postfix) with ESMTP id 06D281B20C; Tue, 20 Feb 2001 16:48:56 -0800 (PST) Received: by muir.vicor-nb.com (Postfix, from userid 1035) id 83D4729A; Tue, 20 Feb 2001 16:48:55 -0800 (PST) Subject: Help with panic: page fault in FBSD 4.1.1 To: freebsd-questions@FreeBSD.ORG Date: Tue, 20 Feb 2001 16:48:55 -0800 (PST) Cc: milt@vicor-nb.com (milt), cayford@vicor-nb.com (Cayford Burrell), jpl@vicor-nb.com (John Lynch), jrh@vicor-nb.com (Josh Howard), scott@vicor-nb.com (Scott Macy), julian@vicor-nb.com (Julian Elischer) X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: <20010221004855.83D4729A@muir.vicor-nb.com> From: scott@vicor-nb.com (Scott Macy) Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG We have been upgrading about 300 PCs at our clients production sites from FreeBSD 2.2.6 to FreeBSD 4.1.1. On two of the oldest machines we have had frequent (daily or so) Kernel crashes with "panic: page fault". We have two kernel core dumps which show the exact same stack trace. We are not seeing these crashes on the newer machines with 4.1.1 running the same application, and the machines that are crashing were stable under 2.2.6. I've looked though the mail archives, and the GNATS database and not found a match to our problem. What's going on?? What can we do to elminate the crashes?? Is there some Kernel config setting that might be causing the problem? Is there another option than "buy new hw?" Some Details: The machines that are crashing are "server" machines, with 6x60GB RAIDs, mostly running our own "QFT" (Quick File Transfer) daemons that do *lots* of network and disk IO. The crashing machines are Pentuim (Pro?) 200's (see dmesg below). We are not seeing the same crash problem on user workstations that are of the same vintage, same OS, but not as heavily loaded. We swapped hardware on one of these older machines with a newer one, and the problem went away when the same disk was used with Pentium II 350 CPU. We now have trapped two kernel core dumps and both give exactly the same information. Below is Kernel Debug Stack Trace info, then Dmesg info. The process that is running at the time of the panic is qftListener, which is the QFT daemon mentioned above. We have two core dumps, both time's it's in a qftListener "open()" call. Thanks, -Scott Macy Gory Details: =================================================================== bigwoop Feb 20 10:54am ~cayford/oos0b_crash 107: gdb -k kern* vm* GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... (no debugging symbols found)... IdlePTD 4358144 initial pcb at 387100 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x8:0xc015129b stack pointer = 0x10:0xc8f58a08 frame pointer = 0x10:0xc8f58a70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 22304 (qftListener) interrupt mask = trap number = 12 panic: page fault syncing disks... 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 giving up on 26 buffers Uptime: 1d9h32m48s (da1:ahc1:0:0:0): Synchronize cache failed, status == 0x34, scsi status == 0x0 (da2:ahc1:0:1:0): Synchronize cache failed, status == 0xb, scsi status == 0x0 (da3:ahc1:0:2:0): Synchronize cache failed, status == 0xb, scsi status == 0x0 dumping to dev #da/0x20001, offset 1310720 dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 3 3 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 0xc0193748 in boot () (kgdb) bt #0 0xc0193748 in boot () #1 0xc0193acc in poweroff_wait () #2 0xc02ede85 in trap_fatal () #3 0xc02edb5d in trap_pfault () #4 0xc02ed717 in trap () #5 0xc015129b in ahc_action () #6 0xc012650f in xpt_run_dev_sendq () #7 0xc01258ff in xpt_action () #8 0xc012cd60 in dastart () #9 0xc01262b8 in xpt_run_dev_allocq () #10 0xc01261e7 in xpt_schedule () #11 0xc012c2c8 in dastrategy () #12 0xc019c9e9 in diskstrategy () #13 0xc01c8ed0 in spec_strategy () #14 0xc01c89a5 in spec_vnoperate () #15 0xc0292ddd in ufs_vnoperatespec () #16 0xc0292845 in ufs_strategy () #17 0xc0292dad in ufs_vnoperate () #18 0xc01b61e6 in bread () #19 0xc0289653 in ffs_blkatoff () #20 0xc028dfe9 in ufs_lookup () #21 0xc0292dad in ufs_vnoperate () #22 0xc01b9bf9 in vfs_cache_lookup () ---Type to continue, or q to quit--- #23 0xc0292dad in ufs_vnoperate () #24 0xc01bc988 in lookup () #25 0xc01bc484 in namei () #26 0xc01c490a in vn_open () #27 0xc01c0c85 in open () #28 0xc02ee131 in syscall2 () #29 0xc02dfca5 in Xint0x80_syscall () #30 0x8050a69 in ?? () #31 0x8051a98 in ?? () #32 0x804a46d in ?? () (kgdb) h =================================================================== Dmesg: Copyright (c) 1992-2000 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.1.1-RELEASE #0: Tue Sep 26 00:46:59 GMT 2000 jkh@narf.osd.bsdi.com:/usr/src/sys/compile/GENERIC Timecounter "i8254" frequency 1193182 Hz CPU: Pentium/P55C (200.46-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x544 Stepping = 4 Features=0x8001bf real memory = 134217728 (131072K bytes) avail memory = 126521344 (123556K bytes) Preloaded elf kernel "kernel" at 0xc0416000. Intel Pentium detected, installing workaround for F00F bug md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xf000-0xf00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: port 0x6400-0x641f irq 11 at device 7.2 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered chip1: port 0x5f00-0x5f0f at device 7.3 on pci0 ahc0: port 0x6800-0x68ff mem 0xe4100000-0xe4100fff irq 10 at device 9.0 on pci0 aic7880: Wide Channel A, SCSI Id=7, 16/255 SCBs ahc1: port 0x6c00-0x6cff mem 0xe4102000-0xe4102fff irq 5 at device 10.0 on pci0 aic7880: Wide Channel A, SCSI Id=7, 16/255 SCBs pci0: at 11.0 irq 9 fxp0: port 0x7000-0x701f mem 0xe4000000-0xe40fffff,0xe4101000-0xe4101fff irq 11 at device 12.0 on pci0 fxp0: Ethernet address 00:a0:c9:5b:f7:78 eisa0: on motherboard eisa0: unknown card ADP7881 (0x04907881) at slot 6 fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 Waiting 15 seconds for SCSI devices to settle da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 10.000MB/s transfers (10.000MHz, offset 15) da0: 8683MB (17783112 512 byte sectors: 255H 63S/T 1106C) da1 at ahc1 bus 0 target 0 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 20.000MB/s transfers (10.000MHz, offset 8, 16bit) da1: 61220MB (125380096 512 byte sectors: 255H 63S/T 7804C) da2 at ahc1 bus 0 target 1 lun 0 da2: Fixed Direct Access SCSI-2 device da2: 20.000MB/s transfers (10.000MHz, offset 8, 16bit) da2: 61220MB (125380096 512 byte sectors: 255H 63S/T 7804C) da3 at ahc1 bus 0 target 2 lun 0 da3: Fixed Direct Access SCSI-2 device da3: 20.000MB/s transfers (10.000MHz, offset 8, 16bit) da3: 61220MB (125380096 512 byte sectors: 255H 63S/T 7804C) Mounting root from ufs:/dev/da0s1a WARNING: / was not properly dismounted =================================================================== And /var/log/messages give no hit to the problem: =================================================================== Feb 15 20:12:26 oos0b inetd[149]: ntalk/udp: no such user 'tty', service ignored Feb 15 20:19:49 oos0b ntpd[105]: time reset -6.935659 s Feb 15 20:19:49 oos0b ntpd[105]: kernel pll status change 2041 Feb 16 22:46:37 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 16 22:46:40 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 16 22:50:30 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 16 22:50:34 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 16 23:41:17 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 16 23:50:57 oos0b last message repeated 3 times Feb 16 23:54:51 oos0b mountd[112]: umountall request from 192.168.40.39 from unprivileged port Feb 17 07:29:58 oos0b /kernel: Copyright (c) 1992-2000 The FreeBSD Project. Feb 17 07:29:58 oos0b /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Feb 17 07:29:58 oos0b /kernel: The Regents of the University of California. All rights reserved. Feb 17 07:29:58 oos0b /kernel: FreeBSD 4.1.1-RELEASE #0: Tue Sep 26 00:46:59 GMT 2000 Feb 17 07:29:58 oos0b /kernel: jkh@narf.osd.bsdi.com:/usr/src/sys/compile/GENERIC Feb 17 07:29:58 oos0b /kernel: Timecounter "i8254" frequency 1193182 Hz Feb 17 07:29:58 oos0b /kernel: CPU: Pentium/P55C (200.46-MHz 586-class CPU) Feb 17 07:29:58 oos0b /kernel: Origin = "GenuineIntel" Id = 0x544 Stepping = 4 Feb 17 07:29:58 oos0b /kernel: Features=0x8001bf Feb 17 07:29:58 oos0b /kernel: real memory = 134217728 (131072K bytes) Feb 17 07:29:58 oos0b /kernel: avail memory = 126521344 (123556K bytes) Feb 17 07:29:58 oos0b /kernel: Preloaded elf kernel "kernel" at 0xc0416000. Feb 17 07:29:58 oos0b /kernel: Intel Pentium detected, installing workaround for F00F bug Feb 17 07:29:58 oos0b /kernel: md0: Malloc disk .... etc with reboot. Feb 17 07:29:59 oos0b /kernel: Mounting root from ufs:/dev/da0s1a Feb 17 07:29:59 oos0b savecore: reboot after panic: page fault Feb 17 07:29:59 oos0b savecore: /var/crash/bounds: No such file or directory Feb 17 07:29:59 oos0b savecore: writing core to /var/crash/vmcore.0 Feb 17 07:30:48 oos0b savecore: writing kernel to /var/crash/kernel.0 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message