From owner-freebsd-stable@FreeBSD.ORG Mon Feb 23 18:04:50 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A0E6F16A4CE for ; Mon, 23 Feb 2004 18:04:50 -0800 (PST) Received: from ganymede.hub.org (u46n208.hfx.eastlink.ca [24.222.46.208]) by mx1.FreeBSD.org (Postfix) with ESMTP id C31BF43D2F for ; Mon, 23 Feb 2004 18:04:49 -0800 (PST) (envelope-from scrappy@hub.org) Received: by ganymede.hub.org (Postfix, from userid 1000) id B16C633CD9; Mon, 23 Feb 2004 22:00:01 -0400 (AST) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id AE45133C8A; Mon, 23 Feb 2004 22:00:01 -0400 (AST) Date: Mon, 23 Feb 2004 22:00:01 -0400 (AST) From: "Marc G. Fournier" To: Phil Rosenthal In-Reply-To: <05F984A0-665D-11D8-8B23-000A958F0F6A@isprime.com> Message-ID: <20040223215839.V48887@ganymede.hub.org> References: <05F984A0-665D-11D8-8B23-000A958F0F6A@isprime.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: >2GB Bugs still exist in FreeBSD 4.9 ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2004 02:04:50 -0000 I build all my systems with the following in make.conf: CFLAGS= -O -mpentium -pipe -g -DKVA_PAGES=512 COPTFLAGS= -O -mpentium -pipe -DKVA_PAGES=512 and the following in my kernel config file: options VM_KMEM_SIZE="(400*1024*1024)" options VM_KMEM_SIZE_MAX="(400*1024*1024)" I don't know if that will correct your problem, but the above was after having oddles of problems with 4GB servers and *alot* of processes running ... On Mon, 23 Feb 2004, Phil Rosenthal wrote: > Hello, > > I've been having this issue for about a year, but haven't had the time > to fully diagnose this, and the servers that had this problem didn't > have a need for 4GB of ram, they just happened to have 4GB, so the > solution was either to remove 2GB, or set hw.physmem="2048M" in > /boot/loader.conf. I finally have enough free time to try and diagnose > this, but I'm not finding it easy to figure out what's going wrong. > > I have about 10 servers like this, Dell PE2650, 6GB of pysical ram > using Dell's "Redundant Memory" feature which leaves the system with > 4GB of "usable memory", AAC Perc3 card with RAID5 volumes. All of them > running apache 1.3, and the ram is mostly used for filesystem cache. > It looks to me like the bug exists somewhere in the filesystem cache, > and unfortunately that's very heavily used here. > > With 2GB of ram, the servers run for months without problems, with 4GB > of ram, they crash within 2 minutes of taking a real load. > > This isn't a "bad hardware" issue, as it happens the same across 10 > servers, and the problem is resolved without changing any hardware, > only changing /boot/loader.conf to limit the ram to 2GB. > > Has anyone seen this before? Any ideas on what might be wrong? > > kgdb: > SMP 4 cpus > IdlePTD at physical address 0x002e9000 > initial pcb at physical address 0x002617c0 > panicstr: page fault > panic messages: > --- > Fatal trap 12: page fault while in kernel mode > mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 > fault virtual address = 0x0 > fault code = supervisor write, page not present > instruction pointer = 0x8:0xc020221f > stack pointer = 0x10:0xff93fcb0 > frame pointer = 0x10:0xff93fce4 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 162 (httpd) > interrupt mask = bio <- SMP: XXX > trap number = 12 > panic: page fault > mp_lock = 00000002; cpuid = 0; lapic.id = 00000000 > boot() called on cpu#0 > > syncing disks... 146 146 146 146 146 146 146 146 146 146 146 146 146 > 146 146 146 146 146 146 146 > giving up on 138 buffers > Uptime: 6m7s > > #0 dumpsys () at ../../kern/kern_shutdown.c:487 > error = 0 > #1 0xc014f0dc in boot (howto=256) at ../../kern/kern_shutdown.c:316 > howto = 256 > #2 0xc014f544 in poweroff_wait (junk=0xc0238979, howto=-1071414225) at > ../../kern/kern_shutdown.c:595 > fmt = 0xc0238979 "%s" > bootopt = 256 > buf = "page fault", '\000' > #3 0xc0203881 in trap_fatal (frame=0xff93fc70, eva=0) at > ../../i386/i386/trap.c:974 > frame = (struct trapframe *) 0x100 > eva = 0 > code = -1071412871 > type = 12 > ss = -1071412871 > esp = 0 > softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, > ssd_dpl = 0, ssd_p = 1, ssd_xx = 4, ssd_xx1 = 3, ssd_def32 = 1, > ssd_gran = 1} > #4 0xc02034f9 in trap_pfault (frame=0xff93fc70, usermode=0, eva=0) at > ../../i386/i386/trap.c:867 > va = 0 > vm = (struct vmspace *) 0x0 > map = 0xf0e41340 > rv = 0 > ftype = 2 '\002' > p = (struct proc *) 0xf0e3c8a0 > #5 0xc0203083 in trap (frame={tf_fs = -773586920, tf_es = -1071251440, > tf_ds = 16, tf_edi = 0, tf_esi = -763979776, tf_ebp = -7078684, tf_isp > = -7078756, tf_ebx = 0, tf_edx = -1744832577, > tf_ecx = 42, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = > -1071635937, tf_cs = 8, tf_eflags = 66050, tf_esp = -253507424, tf_ss = > -1072177704}) at ../../i386/i386/trap.c:466 > p = (struct proc *) 0xf0e3c8a0 > sticks = 17357937978336346112 > i = 0 > ucode = 0 > type = 12 > code = 0 > eva = 0 > #6 0xc020221f in generic_bzero () > No symbol table info available. > #7 0xc01c2834 in ffs_vget (mp=0xd1e12400, ino=41609651, > vpp=0xff93fd94) at ../../ufs/ffs/ffs_vfsops.c:1111 > fs = (struct fs *) 0x68000840 > ip = (struct inode *) 0xd276f100 > ump = (struct ufsmount *) 0xd2158e00 > bp = (struct buf *) 0xff93fdb0 > vp = (struct vnode *) 0xd2769800 > dev = 0x0 > error = -763979776 > #8 0xc01c606b in ufs_lookup (ap=0xff93fdec) at > ../../ufs/ufs/ufs_lookup.c:611 > vdp = (struct vnode *) 0xffbf5cc0 > dp = (struct inode *) 0xd2769800 > bp = (struct buf *) 0xde834c6c > ep = (struct direct *) 0xe42829e0 > entryoffsetinblock = 2528 > slotstatus = FOUND > slotoffset = -1 > slotsize = 0 > slotfreespace = 0 > slotneeded = 0 > numdirpasses = 2 > endsearch = 9216 > prevoff = 2504 > pdp = (struct vnode *) 0xffbf5cc0 > tdp = (struct vnode *) 0x0 > enduseful = 2528 > bmask = 16383 > lockparent = 0 > ---Type to continue, or q to quit--- > wantparent = 0 > namlen = 0 > error = -467129888 > vpp = (struct vnode **) 0xff93fef0 > cnp = (struct componentname *) 0xff93ff04 > cred = (struct ucred *) 0xd2762580 > flags = 49348 > nameiop = 0 > p = (struct proc *) 0xf0e3c8a0 > #9 0xc01ca98d in ufs_vnoperate (ap=0xff93fdec) at > ../../ufs/ufs/ufs_vnops.c:2376 > ap = (struct vop_generic_args *) 0x0 > #10 0xc0179e2e in vfs_cache_lookup (ap=0xff93fe44) at vnode_if.h:77 > rc = 0 > a = {a_desc = 0xc02411e0, a_dvp = 0xffbf5cc0, a_vpp = > 0xff93fef0, a_cnp = 0xff93ff04} > dvp = (struct vnode *) 0xffbf5cc0 > vpp = (struct vnode **) 0xff93fef0 > cnp = (struct componentname *) 0xff93ff04 > ap = (struct vop_lookup_args *) 0x0 > dvp = (struct vnode *) 0xffbf5cc0 > vp = (struct vnode *) 0xff93fe00 > lockparent = 0 > error = 0 > vpp = (struct vnode **) 0xff93fef0 > cnp = (struct componentname *) 0xff93ff04 > cred = (struct ucred *) 0x0 > flags = 49348 > p = (struct proc *) 0xf0e3c8a0 > vpid = 4289738624 > #11 0xc01ca98d in ufs_vnoperate (ap=0xff93fe44) at > ../../ufs/ufs/ufs_vnops.c:2376 > ap = (struct vop_generic_args *) 0x0 > #12 0xc017cec1 in lookup (ndp=0xff93fedc) at vnode_if.h:52 > a = {a_desc = 0xc02411a0, a_dvp = 0xffbf5cc0, a_vpp = > 0xff93fef0, a_cnp = 0xff93ff04} > dvp = (struct vnode *) 0xffbf5cc0 > cnp = (struct componentname *) 0xff93ff04 > cp = 0xff8b643a "" > dp = (struct vnode *) 0xffbf5cc0 > tdp = (struct vnode *) 0xffa3cbc0 > mp = (struct mount *) 0xff8b643a > docache = 32 > wantparent = 0 > rdonly = 0 > trailing_slash = 0 > error = 0 > dpunlocked = 0 > cnp = (struct componentname *) 0xff93ff04 > p = (struct proc *) 0xf0e3c8a0 > #13 0xc017c9ac in namei (ndp=0xff93fedc) at ../../kern/vfs_lookup.c:153 > fdp = (struct filedesc *) 0xff8b6400 > cp = 0xff8b6400 > "/usr/home/xxxxxxxx/xxxxxxx/xxxxxxl/xxxxxxxx/xxxxxxxxxx.jpg" > dp = (struct vnode *) 0xff19fe00 > aiov = {iov_base = 0xff8b641a > "/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", iov_len = 998} > auio = {uio_iov = 0xff93fe70, uio_iovcnt = 1, uio_offset = 26, > uio_resid = 998, uio_segflg = UIO_SYSSPACE, uio_rw = UIO_READ, > uio_procp = 0x0} > error = -15073792 > linklen = -15073792 > cnp = (struct componentname *) 0xff93ff04 > p = (struct proc *) 0xf0e3c8a0 > #14 0xc0182a51 in access (p=0xf0e3c8a0, uap=0xff93ff80) at > ../../kern/vfs_syscalls.c:1633 > cred = (struct ucred *) 0xd236d800 > tmpcred = (struct ucred *) 0xd2762580 > vp = (struct vnode *) 0xff93ff80 > error = -253507424 > ---Type to continue, or q to quit--- > flags = 2 > nd = {ni_dirp = 0x8555f4c > "xxxxxx/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", ni_segflg = UIO_USERSPACE, > ni_startdir = 0x0, ni_rootdir = 0xff19fe00, ni_topdir = 0x0, ni_vp = > 0x0, ni_dvp = 0xffbf5cc0, > ni_pathlen = 1, ni_next = 0xff8b643a "", ni_loopcnt = 1, ni_cnd = > {cn_nameiop = 0, cn_flags = 49348, cn_proc = 0xf0e3c8a0, cn_cred = > 0xd2762580, > cn_pnbuf = 0xff8b6400 > "/usr/home/xxxxxxxx/xxxxxxx/xxxxxxx/xxxxxxxx/xxxxxxxxxx.jpg", > cn_nameptr = 0xff8b642c "xxxxxxxxxx.jpg", cn_namelen = 14, cn_consume = > 0}} > #15 0xc0203bc5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, > tf_edi = -1077964032, tf_esi = 135808336, tf_ebp = -1077964032, tf_isp > = -7077932, tf_ebx = 139830412, tf_edx = 139812684, > tf_ecx = 139812684, tf_eax = 33, tf_trapno = 22, tf_err = 2, > tf_eip = 673512776, tf_cs = 31, tf_eflags = 663, tf_esp = -1077964204, > tf_ss = 47}) at ../../i386/i386/trap.c:1175 > params = 0xbfbf9258 "L_U\b" > i = 0 > callp = (struct sysent *) 0xc0245ea8 > p = (struct proc *) 0xf0e3c8a0 > orig_tf_eflags = 663 > sticks = 4 > error = 0 > narg = 2 > args = {139812684, 0, 1865, 0, 0, 530, 100, -1077972632} > have_mplock = 1 > code = 33 > #16 0xc01f0f5b in Xint0x80_syscall () > No symbol table info available. > #17 0x80df418 in ?? () > No symbol table info available. > > > dmesg: > Feb 23 06:07:35 op3 /kernel: Copyright (c) 1992-2003 The FreeBSD > Project. > Feb 23 06:07:35 op3 /kernel: Copyright (c) 1979, 1980, 1983, 1986, > 1988, 1989, 1991, 1992, 1993, 1994 > Feb 23 06:07:35 op3 /kernel: The Regents of the University of > California. All rights reserved. > Feb 23 06:07:35 op3 /kernel: FreeBSD 4.9-STABLE #0: Thu Feb 12 19:14:40 > PST 2004 > Feb 23 06:07:35 op3 /kernel: > root@op3.isprime.com:/usr/src/sys/compile/MYKERNCONF > Feb 23 06:07:35 op3 /kernel: Timecounter "i8254" frequency 1193182 Hz > Feb 23 06:07:35 op3 /kernel: CPU: Intel(R) Xeon(TM) CPU 2.80GHz > (2784.07-MHz 686-class CPU) > Feb 23 06:07:35 op3 /kernel: Origin = "GenuineIntel" Id = 0xf29 > Stepping = 9 > Feb 23 06:07:35 op3 /kernel: > Features=0xbfebfbff ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Feb 23 06:07:35 op3 /kernel: Hyperthreading: 2 logical CPUs > Feb 23 06:07:35 op3 /kernel: real memory = 4026400768 (3932032K bytes) > Feb 23 06:07:35 op3 /kernel: avail memory = 3923058688 (3831112K bytes) > Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #0 from 0 to > 8 on chip > Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #1 from 0 to > 9 on chip > Feb 23 06:07:35 op3 /kernel: Changing APIC ID for IO APIC #2 from 0 to > 10 on chip > Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #0 > Feb 23 06:07:35 op3 /kernel: IOAPIC #0 intpin 2 -> irq 0 > Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #1 > Feb 23 06:07:35 op3 /kernel: Programming 16 pins in IOAPIC #2 > Feb 23 06:07:35 op3 /kernel: FreeBSD/SMP: Multiprocessor motherboard: 4 > CPUs > Feb 23 06:07:35 op3 /kernel: cpu0 (BSP): apic id: 0, version: > 0x00050014, at 0xfee00000 > Feb 23 06:07:35 op3 /kernel: cpu1 (AP): apic id: 1, version: > 0x00050014, at 0xfee00000 > Feb 23 06:07:35 op3 /kernel: cpu2 (AP): apic id: 6, version: > 0x00050014, at 0xfee00000 > Feb 23 06:07:35 op3 /kernel: cpu3 (AP): apic id: 7, version: > 0x00050014, at 0xfee00000 > Feb 23 06:07:35 op3 /kernel: io0 (APIC): apic id: 8, version: > 0x000f0011, at 0xfec00000 > Feb 23 06:07:35 op3 /kernel: io1 (APIC): apic id: 9, version: > 0x000f0011, at 0xfec01000 > Feb 23 06:07:35 op3 /kernel: io2 (APIC): apic id: 10, version: > 0x000f0011, at 0xfec02000 > Feb 23 06:07:35 op3 /kernel: Preloaded elf kernel "kernel" at > 0xc02cc000. > Feb 23 06:07:35 op3 /kernel: Warning: Pentium 4 CPU: PSE disabled > Feb 23 06:07:35 op3 /kernel: Pentium Pro MTRR support enabled > Feb 23 06:07:35 op3 /kernel: md0: Malloc disk > Feb 23 06:07:35 op3 /kernel: Using $PIR table, 9 entries at 0xc00fc410 > Feb 23 06:07:35 op3 /kernel: npx0: on motherboard > Feb 23 06:07:35 op3 /kernel: npx0: INT 16 interface > Feb 23 06:07:35 op3 /kernel: pcib0: on motherboard > Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 3 -> irq 2 > Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 7 -> irq 3 > Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 11 -> irq 5 > Feb 23 06:07:35 op3 /kernel: pci0: on pcib0 > Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, > dev=0x000c) at 4.0 irq 2 > Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, > dev=0x0008) at 4.1 irq 3 > Feb 23 06:07:35 op3 /kernel: pci0: (vendor=0x1028, > dev=0x000d) at 4.2 irq 5 > Feb 23 06:07:35 op3 /kernel: pci0: > at 14.0 > Feb 23 06:07:35 op3 /kernel: pci0: at 15.1 > Feb 23 06:07:35 op3 /kernel: pci0: at 15.2 irq 0 > Feb 23 06:07:35 op3 /kernel: isab0: device=0225)> at device 15.3 on pci0 > Feb 23 06:07:35 op3 /kernel: isa0: on isab0 > Feb 23 06:07:35 op3 /kernel: pcib1: on motherboard > Feb 23 06:07:35 op3 /kernel: pci1: on pcib1 > Feb 23 06:07:35 op3 /kernel: pcib2: on motherboard > Feb 23 06:07:35 op3 /kernel: pci2: on pcib2 > Feb 23 06:07:35 op3 /kernel: pcib3: on motherboard > Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 12 -> irq 7 > Feb 23 06:07:35 op3 /kernel: IOAPIC #1 intpin 13 -> irq 10 > Feb 23 06:07:35 op3 /kernel: pci3: on pcib3 > Feb 23 06:07:35 op3 /kernel: bge0: ASIC rev. 0x1002> mem 0xfcf10000-0xfcf1ffff irq 7 at device 6.0 on pci3 > Feb 23 06:07:35 op3 /kernel: bge0: Ethernet address: 00:0d:56:70:93:a0 > Feb 23 06:07:35 op3 /kernel: miibus0: on bge0 > Feb 23 06:07:35 op3 /kernel: brgphy0: > on miibus0 > Feb 23 06:07:35 op3 /kernel: brgphy0: 10baseT, 10baseT-FDX, 100baseTX, > 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > Feb 23 06:07:35 op3 /kernel: bge1: ASIC rev. 0x1002> mem 0xfcf00000-0xfcf0ffff irq 10 at device 8.0 on > pci3 > Feb 23 06:07:35 op3 /kernel: bge1: Ethernet address: 00:0d:56:70:93:a1 > Feb 23 06:07:35 op3 /kernel: miibus1: on bge1 > Feb 23 06:07:35 op3 /kernel: brgphy1: > on miibus1 > Feb 23 06:07:35 op3 /kernel: brgphy1: 10baseT, 10baseT-FDX, 100baseTX, > 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > Feb 23 06:07:36 op3 /kernel: pcib4: bridge(unknown chipset)> on motherboard > Feb 23 06:07:36 op3 /kernel: IOAPIC #1 intpin 14 -> irq 11 > Feb 23 06:07:36 op3 /kernel: pci4: on pcib4 > Feb 23 06:07:36 op3 /kernel: pcib8: device=0309)> at device 8.0 on pci4 > Feb 23 06:07:36 op3 /kernel: IOAPIC #1 intpin 15 -> irq 13 > Feb 23 06:07:36 op3 /kernel: pci5: on pcib8 > Feb 23 06:07:36 op3 /kernel: pci5: (vendor=0x9005, > dev=0x00c5) at 6.0 irq 11 > Feb 23 06:07:36 op3 /kernel: pci5: (vendor=0x9005, > dev=0x00c5) at 6.1 irq 13 > Feb 23 06:07:36 op3 /kernel: aac0: mem > 0xf0000000-0xf7ffffff irq 11 at device 8.1 on pci4 > Feb 23 06:07:36 op3 /kernel: aac0: i960RX 100MHz, 118MB cache memory, > optional battery present > Feb 23 06:07:36 op3 /kernel: aac0: Kernel 2.7-1, Build 3170, S/N 1481d3 > Feb 23 06:07:36 op3 /kernel: aac0: Supported > Options=75c > Feb 23 06:07:36 op3 /kernel: pcib5: bridge(unknown chipset)> on motherboard > Feb 23 06:07:36 op3 /kernel: pci6: on pcib5 > Feb 23 06:07:36 op3 /kernel: pcib6: bridge(unknown chipset)> on motherboard > Feb 23 06:07:36 op3 /kernel: pci7: on pcib6 > Feb 23 06:07:36 op3 /kernel: pcib7: bridge(unknown chipset)> on motherboard > Feb 23 06:07:36 op3 /kernel: pci8: on pcib7 > Feb 23 06:07:36 op3 /kernel: orm0: