Date: Mon, 17 Jun 1996 01:02:40 -0700 From: "Jeffrey D. Wheelhouse" <jdw@wwwi.com> To: freebsd-stable@freebsd.org Subject: Re: Trap 12/supervisor read, page not present Message-ID: <199606170804.BAA14727@voltimand.csd.wwwi.com>
next in thread | raw e-mail | index | archive | help
At 05:22 PM 6/16/96 -0700, you wrote: > I've been running this same kernel code for the past 40 hours while running >my "thrash" regression test (which consists of about a dozen parallel >compiles, a fork-exec-exit endless loop, some filesystem traversal scripts, >top, and about a half dozen other things that excercise networking and other >parts of the system. I haven't had any problems. I'm running a variant of this >code on wcarchive (ftp.cdrom.com) that is slightly older and doesn't contain >all of the fixes, and it's been up now for 5 days (load is around 700 users >much of the time). > ...so I'm at a loss to explain your instability problems. It would help if >you could describe the hardware you're using, your kernel configuration, and >the kind of load that is on the machine. Here is the machine: ASUS P54NP EISA/PCI Dual-proc motherboard (1 90Mhz Pentium Processor) 2x32mb 70ns SIMMs Adaptec AHA-2940W Controller Quantum Empire 2100S (2gig) Quantum Atlas 34300W (4gig, wide) Brand X I/O IDE Micropolis 1.5gig drive jumpered to act as a 500meg and a 1gig because of controller age 3Com 3c509 (ISA, running 10BaseT) Brand X Cirrus ISA video card I'll replace any part except the SCSI disks and the RAM to make it work. This machine previously ran Unixware 2.0 (sold to SCO, ugh), and Linux (just didn't like Linux) under similar workloads without problems but that by now means rules out some new failure. The Atlas is the only new component because a Grand Prix died under the pressure of being news spool. Here is my kernel configuration: machine "i386" cpu "I386_CPU" cpu "I486_CPU" cpu "I586_CPU" ident VOLTIMAND maxusers 32 options MATH_EMULATE #Support for x87 emulation options INET #InterNETworking options FFS #Berkeley Fast Filesystem options NFS #Network Filesystem options "CD9660" #ISO 9660 Filesystem options PROCFS #Process filesystem options "COMPAT_43" #Compatible with BSD 4.3 options "SCSI_DELAY=5" #Be pessimistic about Joe SCSI device options BOUNCE_BUFFERS #include support for DMA bounce buffers options UCONSOLE #Allow users to grab the console config kernel root on wd0 controller isa0 controller eisa0 controller pci0 controller fdc0 at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr disk fd0 at fdc0 drive 0 disk fd1 at fdc0 drive 1 tape ft0 at fdc0 drive 2 controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr disk wd0 at wdc0 drive 0 disk wd1 at wdc0 drive 1 controller wdc1 at isa? port "IO_WD2" bio irq 15 vector wdintr disk wd2 at wdc1 drive 0 disk wd3 at wdc1 drive 1 controller ahc0 controller ahc1 controller scbus0 device sd0 device st0 device cd0 #Only need one of these, the code dynamically grows device sc0 at isa? port "IO_KBD" tty irq 1 vector scintr device npx0 at isa? port "IO_NPX" irq 13 vector npxintr device sio0 at isa? port "IO_COM1" tty irq 4 vector siointr device sio1 at isa? port "IO_COM2" tty irq 3 vector siointr device lpt0 at isa? port? tty irq 7 vector lptintr device ep0 at isa? port 0x300 net irq 5 vector epintr pseudo-device loop pseudo-device ether pseudo-device log pseudo-device bpfilter 1 pseudo-device pty 16 pseudo-device gzip # Exec gzipped a.out's Basically I stripped everything I didn't need in the hopes of expurgating the problem. This machine runs a moderate newsserver. The problem appears to be related to this server (INN 1.4Unoff4) because on the last reboot fsck ate the active file, preventing news from starting and the machine stayed up for the 10+ hours until I came home. The machine also runs very very a low usage web server (traffic < negligable), and smtp, pop, nfs, samba, and dhcp servers for my local network (5-6 machines). It has typically no users or just me logged on and crashes even when no one is around to be using NFS/Samba. Any process except news can be moved to another machine if it will help. Like you point out, pushing a few newsfeeds around is nothing compared to ftp.cdrom.com... I too am at a loss to explain this except as bad hardware, but I don't know what hardware it would be. Don't think it's RAM because they passed test and the crash has been consistant at the same address. An issue with the 2940W? I did have to disable wide transfers on the wide drive and sync negotiation for both (not sure which fixed it) to make the narrow drive stop rebooting the SCSI bus on every drive access, but that was a dozen sups and kernels ago, haven't had time to play with it since. Here is a dmesg from the last crash today (when it ate the active file): Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xf0193b22 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4 (update) interrupt mask = net tty bio panic: page fault Actually this text appears twice identically in the dmesg, but I figured that was a glitch until I saw the backtrace. Kernel nm: f0193814 T _pmap_is_referenced f01939a8 T _pmap_is_modified f0193b70 T _pmap_clear_modify f0193cc0 T _pmap_clear_reference f0193e10 T _pmap_copy_on_write And thanks to the -g kernel, I have a symbolic backtrace to make this message even outrageously longer: #0 boot (howto=260) at ../../i386/i386/machdep.c:911 #1 0xf0112b53 in panic (fmt=0xf0194ddc "page fault") at ../../kern/subr_prf.c:116 #2 0xf01958de in trap_fatal (frame=0xefbff938) at ../../i386/i386/trap.c:746 #3 0xf0195450 in trap_pfault (frame=0xefbff938, usermode=0) at ../../i386/i386/trap.c:668 #4 0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1073724352, tf_esi = 137904128, tf_ebp = -272631416, tf_isp = -272631456, tf_ebx = -265196140, tf_edx = -171352064, tf_ecx = -249858708, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8, tf_eflags = 66118, tf_esp = -265408468, tf_ss = 4096}) at ../../i386/i386/trap.c:308 #5 0xf018b2d1 in calltrap () #6 0xf0185c77 in vm_page_test_dirty (m=0xf02e302c) at ../../vm/vm_page.c:1121 #7 0xf0121442 in brelse (bp=0xf3158eb0) at ../../kern/vfs_bio.c:469 #8 0xf0122a1e in biodone (bp=0xf3158eb0) at ../../kern/vfs_bio.c:1275 #9 0xf0167184 in scsi_done (xs=0xf11c1080) at ../../scsi/scsi_base.c:429 #10 0xf01b4c6c in ahc_done (ahc=0xf0f23000, scb=0xf11d8000) at ../../i386/scsi/aic7xxx.c:1947 #11 0xf01b477d in ahc_intr (arg=0xf0f23000) at ../../i386/scsi/aic7xxx.c:1859 #12 0xf015f52b in ahc_pci_intr (arg=0xf0f23000) at ../../pci/aic7870.c:592 #13 0xf018c25d in Xresume10 () #14 0xf0122637 in biowait (bp=0xf314b0e0) at ../../kern/vfs_bio.c:1132 #15 0xf0120da7 in bread (vp=0xf1150600, blkno=96, size=8192, cred=0xffffffff, bpp=0xefbffcd4) at ../../kern/vfs_bio.c:187 #16 0xf0170ee5 in ffs_update (ap=0xefbffcfc) at ../../ufs/ffs/ffs_inode.c:133 #17 0xf01741ba in ffs_fsync (ap=0xefbffd40) at ./vnode_if.h:850 #18 0xf01731a9 in ffs_sync (mp=0xf115d000, waitfor=2, cred=0xf0f21600, p=0xf01cc250) at ./vnode_if.h:335 #19 0xf01272d2 in sync (p=0xf01cc250, uap=0x0, retval=0x0) at ../../kern/vfs_syscalls.c:336 #20 0xf018d915 in boot (howto=256) at ../../i386/i386/machdep.c:870 #21 0xf0112b53 in panic (fmt=0xf0194ddc "page fault") at ../../kern/subr_prf.c:116 #22 0xf01958de in trap_fatal (frame=0xefbffe48) at ../../i386/i386/trap.c:746 #23 0xf0195450 in trap_pfault (frame=0xefbffe48, usermode=0) at ../../i386/i386/trap.c:668 #24 0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -2147483648, tf_esi = 137187328, tf_ebp = -272630120, tf_isp = -272630160, tf_ebx = -265310128, tf_edx = -171352064, tf_ecx = -249858708, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8, tf_eflags = 66118, tf_esp = -265902416, tf_ss = -2147483648}) at ../../i386/i386/trap.c:308 #25 0xf018b2d1 in calltrap () #26 0xf0185c77 in vm_page_test_dirty (m=0xf026a6b0) at ../../vm/vm_page.c:1121 #27 0xf01831ce in _vm_object_page_clean (object=0xf11b6380, start=0, end=0, syncio=1) at ../../vm/vm_object.c:584 #28 0xf0126d79 in vfs_msync (mp=0xf115d800, flags=2) at ../../kern/vfs_subr.c:1543 #29 0xf01272b4 in sync (p=0xf1137500, uap=0x0, retval=0x0) at ../../kern/vfs_syscalls.c:335 #30 0xf0122ad3 in vfs_update () at ../../kern/vfs_bio.c:1307 #31 0xf010653d in main (framep=0xefbfff88) at ../../kern/init_main.c:358 I know everyone on this list really wanted to know this much about the guts of my machine; I sincerely apologize for spamming everyone and I hope that I will in the future contribute enough to outweigh this inconvenience. Later, Jeff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606170804.BAA14727>