From owner-freebsd-stable@FreeBSD.ORG Mon Feb 23 16:04:55 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89D7316A4CE for ; Mon, 23 Feb 2004 16:04:55 -0800 (PST) Received: from lexus.isprime.com (lexus.isprime.com [66.230.130.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 303D243D1D for ; Mon, 23 Feb 2004 16:04:55 -0800 (PST) (envelope-from pr@isprime.com) Received: from [66.230.128.34] (winter.isprime.com [66.230.128.34]) by lexus.isprime.com (8.12.10/8.12.6) with ESMTP id i1O04kEE055240 for ; Mon, 23 Feb 2004 19:04:54 -0500 (EST) (envelope-from pr@isprime.com) Mime-Version: 1.0 (Apple Message framework v612) Content-Transfer-Encoding: 7bit Message-Id: <05F984A0-665D-11D8-8B23-000A958F0F6A@isprime.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: freebsd-stable@freebsd.org From: Phil Rosenthal Date: Mon, 23 Feb 2004 19:04:32 -0500 X-Mailer: Apple Mail (2.612) Subject: >2GB Bugs still exist in FreeBSD 4.9 ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 00:04:55 -0000 Hello, I've been having this issue for about a year, but haven't had the time to fully diagnose this, and the servers that had this problem didn't have a need for 4GB of ram, they just happened to have 4GB, so the solution was either to remove 2GB, or set hw.physmem="2048M" in /boot/loader.conf. I finally have enough free time to try and diagnose this, but I'm not finding it easy to figure out what's going wrong. I have about 10 servers like this, Dell PE2650, 6GB of pysical ram using Dell's "Redundant Memory" feature which leaves the system with 4GB of "usable memory", AAC Perc3 card with RAID5 volumes. All of them running apache 1.3, and the ram is mostly used for filesystem cache. It looks to me like the bug exists somewhere in the filesystem cache, and unfortunately that's very heavily used here. With 2GB of ram, the servers run for months without problems, with 4GB of ram, they crash within 2 minutes of taking a real load. This isn't a "bad hardware" issue, as it happens the same across 10 servers, and the problem is resolved without changing any hardware, only changing /boot/loader.conf to limit the ram to 2GB. Has anyone seen this before? Any ideas on what might be wrong? kgdb: SMP 4 cpus IdlePTD at physical address 0x002e9000 initial pcb at physical address 0x002617c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x8:0xc020221f stack pointer = 0x10:0xff93fcb0 frame pointer = 0x10:0xff93fce4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 162 (httpd) interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 boot() called on cpu#0 syncing disks... 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 146 giving up on 138 buffers Uptime: 6m7s #0 dumpsys () at ../../kern/kern_shutdown.c:487 error = 0 #1 0xc014f0dc in boot (howto=256) at ../../kern/kern_shutdown.c:316 howto = 256 #2 0xc014f544 in poweroff_wait (junk=0xc0238979, howto=-1071414225) at ../../kern/kern_shutdown.c:595 fmt = 0xc0238979 "%s" bootopt = 256 buf = "page fault", '\000' #3 0xc0203881 in trap_fatal (frame=0xff93fc70, eva=0) at ../../i386/i386/trap.c:974 frame = (struct trapframe *) 0x100 eva = 0 code = -1071412871 type = 12 ss = -1071412871 esp = 0 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_xx = 4, ssd_xx1 = 3, ssd_def32 = 1, ssd_gran = 1} #4 0xc02034f9 in trap_pfault (frame=0xff93fc70, usermode=0, eva=0) at ../../i386/i386/trap.c:867 va = 0 vm = (struct vmspace *) 0x0 map = 0xf0e41340 rv = 0 ftype = 2 '\002' p = (struct proc *) 0xf0e3c8a0 #5 0xc0203083 in trap (frame={tf_fs = -773586920, tf_es = -1071251440, tf_ds = 16, tf_edi = 0, tf_esi = -763979776, tf_ebp = -7078684, tf_isp = -7078756, tf_ebx = 0, tf_edx = -1744832577, tf_ecx = 42, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1071635937, tf_cs = 8, tf_eflags = 66050, tf_esp = -253507424, tf_ss = -1072177704}) at ../../i386/i386/trap.c:466 p = (struct proc *) 0xf0e3c8a0 sticks = 17357937978336346112 i = 0 ucode = 0 type = 12 code = 0 eva = 0 #6 0xc020221f in generic_bzero () No symbol table info available. #7 0xc01c2834 in ffs_vget (mp=0xd1e12400, ino=41609651, vpp=0xff93fd94) at ../../ufs/ffs/ffs_vfsops.c:1111 fs = (struct fs *) 0x68000840 ip = (struct inode *) 0xd276f100 ump = (struct ufsmount *) 0xd2158e00 bp = (struct buf *) 0xff93fdb0 vp = (struct vnode *) 0xd2769800 dev = 0x0 error = -763979776 #8 0xc01c606b in ufs_lookup (ap=0xff93fdec) at ../../ufs/ufs/ufs_lookup.c:611 vdp = (struct vnode *) 0xffbf5cc0 dp = (struct inode *) 0xd2769800 bp = (struct buf *) 0xde834c6c ep = (struct direct *) 0xe42829e0 entryoffsetinblock = 2528 slotstatus = FOUND slotoffset = -1 slotsize = 0 slotfreespace = 0 slotneeded = 0 numdirpasses = 2 endsearch = 9216 prevoff = 2504 pdp = (struct vnode *) 0xffbf5cc0 tdp = (struct vnode *) 0x0 enduseful = 2528 bmask = 16383 lockparent = 0 ---Type to continue, or q to quit--- wantparent = 0 namlen = 0 error = -467129888 vpp = (struct vnode **) 0xff93fef0 cnp = (struct componentname *) 0xff93ff04 cred = (struct ucred *) 0xd2762580 flags = 49348 nameiop = 0 p = (struct proc *) 0xf0e3c8a0 #9 0xc01ca98d in ufs_vnoperate (ap=0xff93fdec) at ../../ufs/ufs/ufs_vnops.c:2376 ap = (struct vop_generic_args *) 0x0 #10 0xc0179e2e in vfs_cache_lookup (ap=0xff93fe44) at vnode_if.h:77 rc = 0 a = {a_desc = 0xc02411e0, a_dvp = 0xffbf5cc0, a_vpp = 0xff93fef0, a_cnp = 0xff93ff04} dvp = (struct vnode *) 0xffbf5cc0 vpp = (struct vnode **) 0xff93fef0 cnp = (struct componentname *) 0xff93ff04 ap = (struct vop_lookup_args *) 0x0 dvp = (struct vnode *) 0xffbf5cc0 vp = (struct vnode *) 0xff93fe00 lockparent = 0 error = 0 vpp = (struct vnode **) 0xff93fef0 cnp = (struct componentname *) 0xff93ff04 cred = (struct ucred *) 0x0 flags = 49348 p = (struct proc *) 0xf0e3c8a0 vpid = 4289738624 #11 0xc01ca98d in ufs_vnoperate (ap=0xff93fe44) at ../../ufs/ufs/ufs_vnops.c:2376 ap = (struct vop_generic_args *) 0x0 #12 0xc017cec1 in lookup (ndp=0xff93fedc) at vnode_if.h:52 a = {a_desc = 0xc02411a0, a_dvp = 0xffbf5cc0, a_vpp = 0xff93fef0, a_cnp = 0xff93ff04} dvp = (struct vnode *) 0xffbf5cc0 cnp = (struct componentname *) 0xff93ff04 cp = 0xff8b643a "" dp = (struct vnode *) 0xffbf5cc0 tdp = (struct vnode *) 0xffa3cbc0 mp = (struct mount *) 0xff8b643a docache = 32 wantparent = 0 rdonly = 0 trailing_slash = 0 error = 0 dpunlocked = 0 cnp = (struct componentname *) 0xff93ff04 p = (struct proc *) 0xf0e3c8a0 #13 0xc017c9ac in namei (ndp=0xff93fedc) at ../../kern/vfs_lookup.c:153 fdp = (struct filedesc *) 0xff8b6400 cp = 0xff8b6400 "/usr/home/xxxxxxxx/xxxxxxx/xxxxxxl/xxxxxxxx/xxxxxxxxxx.jpg" dp = (struct vnode *) 0xff19fe00 aiov = {iov_base = 0xff8b641a "/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", iov_len = 998} auio = {uio_iov = 0xff93fe70, uio_iovcnt = 1, uio_offset = 26, uio_resid = 998, uio_segflg = UIO_SYSSPACE, uio_rw = UIO_READ, uio_procp = 0x0} error = -15073792 linklen = -15073792 cnp = (struct componentname *) 0xff93ff04 p = (struct proc *) 0xf0e3c8a0 #14 0xc0182a51 in access (p=0xf0e3c8a0, uap=0xff93ff80) at ../../kern/vfs_syscalls.c:1633 cred = (struct ucred *) 0xd236d800 tmpcred = (struct ucred *) 0xd2762580 vp = (struct vnode *) 0xff93ff80 error = -253507424 ---Type to continue, or q to quit--- flags = 2 nd = {ni_dirp = 0x8555f4c "xxxxxx/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", ni_segflg = UIO_USERSPACE, ni_startdir = 0x0, ni_rootdir = 0xff19fe00, ni_topdir = 0x0, ni_vp = 0x0, ni_dvp = 0xffbf5cc0, ni_pathlen = 1, ni_next = 0xff8b643a "", ni_loopcnt = 1, ni_cnd = {cn_nameiop = 0, cn_flags = 49348, cn_proc = 0xf0e3c8a0, cn_cred = 0xd2762580, cn_pnbuf = 0xff8b6400 "/usr/home/xxxxxxxx/xxxxxxx/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", cn_nameptr = 0xff8b642c "xxxxxxxxxx.jpg", cn_namelen = 14, cn_consume = 0}} #15 0xc0203bc5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077964032, tf_esi = 135808336, tf_ebp = -1077964032, tf_isp = -7077932, tf_ebx = 139830412, tf_edx = 139812684, tf_ecx = 139812684, tf_eax = 33, tf_trapno = 22, tf_err = 2, tf_eip = 673512776, tf_cs = 31, tf_eflags = 663, tf_esp = -1077964204, tf_ss = 47}) at ../../i386/i386/trap.c:1175 params = 0xbfbf9258 "L_U\b" i = 0 callp = (struct sysent *) 0xc0245ea8 p = (struct proc *) 0xf0e3c8a0 orig_tf_eflags = 663 sticks = 4 error = 0 narg = 2 args = {139812684, 0, 1865, 0, 0, 530, 100, -1077972632} have_mplock = 1 code = 33 #16 0xc01f0f5b in Xint0x80_syscall () No symbol table info available. #17 0x80df418 in ?? () No symbol table info available. dmesg: Feb 23 06:07:35 op3 /kernel: Copyright (c) 1992-2003 The FreeBSD Project. Feb 23 06:07:35 op3 /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Feb 23 06:07:35 op3 /kernel: The Regents of the University of California. All rights reserved. Feb 23 06:07:35 op3 /kernel: FreeBSD 4.9-STABLE #0: Thu Feb 12 19:14:40 PST 2004 Feb 23 06:07:35 op3 /kernel: root@op3.isprime.com:/usr/src/sys/compile/MYKERNCONF Feb 23 06:07:35 op3 /kernel: Timecounter "i8254" frequency 1193182 Hz Feb 23 06:07:35 op3 /kernel: CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2784.07-MHz 686-class CPU) Feb 23 06:07:35 op3 /kernel: Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Feb 23 06:07:35 op3 /kernel: Features=0xbfebfbff Feb 23 06:07:35 op3 /kernel: Hyperthreading: 2 logical CPUs Feb 23 06:07:35 op3 /kernel: real memory = 4026400768 (3932032K bytes) Feb 23 06:07:35 op3 /kernel: avail memory = 3923058688 (3831112K bytes) Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #0 from 0 to 8 on chip Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #1 from 0 to 9 on chip Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #2 from 0 to 10 on chip Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #0 Feb 23 06:07:35 op3 /kernel: IOAPIC #0 intpin 2 -> irq 0 Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #1 Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #2 Feb 23 06:07:35 op3 /kernel: FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs Feb 23 06:07:35 op3 /kernel: cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 Feb 23 06:07:35 op3 /kernel: cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 Feb 23 06:07:35 op3 /kernel: cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 Feb 23 06:07:35 op3 /kernel: cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 Feb 23 06:07:35 op3 /kernel: io0 (APIC): apic id: 8, version: 0x000f0011, at 0xfec00000 Feb 23 06:07:35 op3 /kernel: io1 (APIC): apic id: 9, version: 0x000f0011, at 0xfec01000 Feb 23 06:07:35 op3 /kernel: io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000 Feb 23 06:07:35 op3 /kernel: Preloaded elf kernel "kernel" at 0xc02cc000. Feb 23 06:07:35 op3 /kernel: Warning: Pentium 4 CPU: PSE disabled Feb 23 06:07:35 op3 /kernel: Pentium Pro MTRR support enabled Feb 23 06:07:35 op3 /kernel: md0: Malloc disk Feb 23 06:07:35 op3 /kernel: Using $PIR table, 9 entries at 0xc00fc410 Feb 23 06:07:35 op3 /kernel: npx0: on motherboard Feb 23 06:07:35 op3 /kernel: npx0: INT 16 interface Feb 23 06:07:35 op3 /kernel: pcib0: on motherboard Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 3 -> irq 2 Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 7 -> irq 3 Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 11 -> irq 5 Feb 23 06:07:35 op3 /kernel: pci0: on pcib0 Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, dev=0x000c) at 4.0 irq 2 Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, dev=0x0008) at 4.1 irq 3 Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, dev=0x000d) at 4.2 irq 5 Feb 23 06:07:35 op3 /kernel: pci0: at 14.0 Feb 23 06:07:35 op3 /kernel: pci0: at 15.1 Feb 23 06:07:35 op3 /kernel: pci0: at 15.2 irq 0 Feb 23 06:07:35 op3 /kernel: isab0: at device 15.3 on pci0 Feb 23 06:07:35 op3 /kernel: isa0: on isab0 Feb 23 06:07:35 op3 /kernel: pcib1: on motherboard Feb 23 06:07:35 op3 /kernel: pci1: on pcib1 Feb 23 06:07:35 op3 /kernel: pcib2: on motherboard Feb 23 06:07:35 op3 /kernel: pci2: on pcib2 Feb 23 06:07:35 op3 /kernel: pcib3: on motherboard Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 12 -> irq 7 Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 13 -> irq 10 Feb 23 06:07:35 op3 /kernel: pci3: on pcib3 Feb 23 06:07:35 op3 /kernel: bge0: mem 0xfcf10000-0xfcf1ffff irq 7 at device 6.0 on pci3 Feb 23 06:07:35 op3 /kernel: bge0: Ethernet address: 00:0d:56:70:93:a0 Feb 23 06:07:35 op3 /kernel: miibus0: on bge0 Feb 23 06:07:35 op3 /kernel: brgphy0: on miibus0 Feb 23 06:07:35 op3 /kernel: brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto Feb 23 06:07:35 op3 /kernel: bge1: mem 0xfcf00000-0xfcf0ffff irq 10 at device 8.0 on pci3 Feb 23 06:07:35 op3 /kernel: bge1: Ethernet address: 00:0d:56:70:93:a1 Feb 23 06:07:35 op3 /kernel: miibus1: on bge1 Feb 23 06:07:35 op3 /kernel: brgphy1: on miibus1 Feb 23 06:07:35 op3 /kernel: brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto Feb 23 06:07:36 op3 /kernel: pcib4: on motherboard Feb 23 06:07:36 op3 /kernel: IOAPIC #1 intpin 14 -> irq 11 Feb 23 06:07:36 op3 /kernel: pci4: on pcib4 Feb 23 06:07:36 op3 /kernel: pcib8: at device 8.0 on pci4 Feb 23 06:07:36 op3 /kernel: IOAPIC #1 intpin 15 -> irq 13 Feb 23 06:07:36 op3 /kernel: pci5: on pcib8 Feb 23 06:07:36 op3 /kernel: pci5: (vendor=0x9005, dev=0x00c5) at 6.0 irq 11 Feb 23 06:07:36 op3 /kernel: pci5: (vendor=0x9005, dev=0x00c5) at 6.1 irq 13 Feb 23 06:07:36 op3 /kernel: aac0: mem 0xf0000000-0xf7ffffff irq 11 at device 8.1 on pci4 Feb 23 06:07:36 op3 /kernel: aac0: i960RX 100MHz, 118MB cache memory, optional battery present Feb 23 06:07:36 op3 /kernel: aac0: Kernel 2.7-1, Build 3170, S/N 1481d3 Feb 23 06:07:36 op3 /kernel: aac0: Supported Options=75c Feb 23 06:07:36 op3 /kernel: pcib5: on motherboard Feb 23 06:07:36 op3 /kernel: pci6: on pcib5 Feb 23 06:07:36 op3 /kernel: pcib6: on motherboard Feb 23 06:07:36 op3 /kernel: pci7: on pcib6 Feb 23 06:07:36 op3 /kernel: pcib7: on motherboard Feb 23 06:07:36 op3 /kernel: pci8: on pcib7 Feb 23 06:07:36 op3 /kernel: orm0: