Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Jun 1996 01:02:40 -0700
From:      "Jeffrey D. Wheelhouse" <jdw@wwwi.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: Trap 12/supervisor read, page not present 
Message-ID:  <199606170804.BAA14727@voltimand.csd.wwwi.com>

next in thread | raw e-mail | index | archive | help
At 05:22 PM 6/16/96 -0700, you wrote:
>   I've been running this same kernel code for the past 40 hours while running
>my "thrash" regression test (which consists of about a dozen parallel
>compiles, a fork-exec-exit endless loop, some filesystem traversal scripts,
>top, and about a half dozen other things that excercise networking and other
>parts of the system. I haven't had any problems. I'm running a variant of this
>code on wcarchive (ftp.cdrom.com) that is slightly older and doesn't contain
>all of the fixes, and it's been up now for 5 days (load is around 700 users
>much of the time).
>   ...so I'm at a loss to explain your instability problems. It would help if
>you could describe the hardware you're using, your kernel configuration, and
>the kind of load that is on the machine.

Here is the machine:
ASUS P54NP EISA/PCI Dual-proc motherboard (1 90Mhz Pentium Processor)
2x32mb 70ns SIMMs
Adaptec AHA-2940W Controller
  Quantum Empire 2100S (2gig)
  Quantum Atlas 34300W (4gig, wide)
Brand X I/O IDE
  Micropolis 1.5gig drive jumpered to act as a 500meg and a 1gig
  because of controller age
3Com 3c509 (ISA, running 10BaseT)
Brand X Cirrus ISA video card

I'll replace any part except the SCSI disks and the RAM to make it work.
This machine previously ran Unixware 2.0 (sold to SCO, ugh), and Linux
(just didn't like Linux) under similar workloads without problems but 
that by now means rules out some new failure.  The Atlas is the only 
new component because a Grand Prix died under the pressure of being news
spool.

Here is my kernel configuration:
machine         "i386"
cpu             "I386_CPU"
cpu             "I486_CPU"
cpu             "I586_CPU"
ident           VOLTIMAND
maxusers        32
options         MATH_EMULATE            #Support for x87 emulation
options         INET                    #InterNETworking
options         FFS                     #Berkeley Fast Filesystem
options         NFS                     #Network Filesystem
options         "CD9660"                #ISO 9660 Filesystem
options         PROCFS                  #Process filesystem
options         "COMPAT_43"             #Compatible with BSD 4.3
options         "SCSI_DELAY=5"          #Be pessimistic about Joe SCSI device
options         BOUNCE_BUFFERS          #include support for DMA bounce buffers
options         UCONSOLE                #Allow users to grab the console
config          kernel  root on wd0 
controller      isa0
controller      eisa0
controller      pci0
controller      fdc0    at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr
disk            fd0     at fdc0 drive 0
disk            fd1     at fdc0 drive 1
tape            ft0     at fdc0 drive 2
controller      wdc0    at isa? port "IO_WD1" bio irq 14 vector wdintr
disk            wd0     at wdc0 drive 0
disk            wd1     at wdc0 drive 1
controller      wdc1    at isa? port "IO_WD2" bio irq 15 vector wdintr
disk            wd2     at wdc1 drive 0
disk            wd3     at wdc1 drive 1
controller      ahc0
controller      ahc1
controller      scbus0
device          sd0
device          st0
device          cd0     #Only need one of these, the code dynamically grows
device          sc0     at isa? port "IO_KBD" tty irq 1 vector scintr
device          npx0    at isa? port "IO_NPX" irq 13 vector npxintr
device          sio0    at isa? port "IO_COM1" tty irq 4 vector siointr
device          sio1    at isa? port "IO_COM2" tty irq 3 vector siointr
device          lpt0    at isa? port? tty irq 7 vector lptintr
device ep0 at isa? port 0x300 net irq 5 vector epintr
pseudo-device   loop
pseudo-device   ether
pseudo-device   log
pseudo-device   bpfilter 1
pseudo-device   pty     16
pseudo-device   gzip            # Exec gzipped a.out's

Basically I stripped everything I didn't need in the hopes
of expurgating the problem.

This machine runs a moderate newsserver.  The problem appears to
be related to this server (INN 1.4Unoff4) because on the last reboot
fsck ate the active file, preventing news from starting and the
machine stayed up for the 10+ hours until I came home.

The machine also runs very very a low usage web server
(traffic < negligable), and smtp, pop, nfs, samba, and 
dhcp servers for my local network (5-6 machines).  It has
typically no users or just me logged on and crashes even when
no one is around to be using NFS/Samba.  Any process except 
news can be moved to another machine if it will help.

Like you point out, pushing a few newsfeeds around is nothing compared
to ftp.cdrom.com...  I too am at a loss to explain this except as bad 
hardware, but I don't know what hardware it would be.  Don't think it's
RAM because they passed test and the crash has been consistant at the
same address.  An issue with the 2940W?  I did have to disable wide 
transfers on the wide drive and sync negotiation for both (not sure 
which fixed it) to make the narrow drive stop rebooting the SCSI bus
on every drive access, but that was a dozen sups and kernels ago, 
haven't had time to play with it since.

Here is a dmesg from the last crash today (when it ate the active file):
Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xf0193b22
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 4 (update)
interrupt mask		= net tty bio 
panic: page fault

Actually this text appears twice identically in the dmesg, but
I figured that was a glitch until I saw the backtrace.

Kernel nm:
f0193814 T _pmap_is_referenced
f01939a8 T _pmap_is_modified
f0193b70 T _pmap_clear_modify
f0193cc0 T _pmap_clear_reference
f0193e10 T _pmap_copy_on_write

And thanks to the -g kernel, I have a symbolic backtrace to make
this message even outrageously longer:
#0  boot (howto=260) at ../../i386/i386/machdep.c:911
#1  0xf0112b53 in panic (fmt=0xf0194ddc "page fault")
    at ../../kern/subr_prf.c:116
#2  0xf01958de in trap_fatal (frame=0xefbff938) at ../../i386/i386/trap.c:746
#3  0xf0195450 in trap_pfault (frame=0xefbff938, usermode=0)
    at ../../i386/i386/trap.c:668
#4  0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1073724352, 
      tf_esi = 137904128, tf_ebp = -272631416, tf_isp = -272631456, 
      tf_ebx = -265196140, tf_edx = -171352064, tf_ecx = -249858708, 
      tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8, 
      tf_eflags = 66118, tf_esp = -265408468, tf_ss = 4096})
    at ../../i386/i386/trap.c:308
#5  0xf018b2d1 in calltrap ()
#6  0xf0185c77 in vm_page_test_dirty (m=0xf02e302c) at ../../vm/vm_page.c:1121
#7  0xf0121442 in brelse (bp=0xf3158eb0) at ../../kern/vfs_bio.c:469
#8  0xf0122a1e in biodone (bp=0xf3158eb0) at ../../kern/vfs_bio.c:1275
#9  0xf0167184 in scsi_done (xs=0xf11c1080) at ../../scsi/scsi_base.c:429
#10 0xf01b4c6c in ahc_done (ahc=0xf0f23000, scb=0xf11d8000)
    at ../../i386/scsi/aic7xxx.c:1947
#11 0xf01b477d in ahc_intr (arg=0xf0f23000) at ../../i386/scsi/aic7xxx.c:1859
#12 0xf015f52b in ahc_pci_intr (arg=0xf0f23000) at ../../pci/aic7870.c:592
#13 0xf018c25d in Xresume10 ()
#14 0xf0122637 in biowait (bp=0xf314b0e0) at ../../kern/vfs_bio.c:1132
#15 0xf0120da7 in bread (vp=0xf1150600, blkno=96, size=8192, cred=0xffffffff, 
    bpp=0xefbffcd4) at ../../kern/vfs_bio.c:187
#16 0xf0170ee5 in ffs_update (ap=0xefbffcfc) at ../../ufs/ffs/ffs_inode.c:133
#17 0xf01741ba in ffs_fsync (ap=0xefbffd40) at ./vnode_if.h:850
#18 0xf01731a9 in ffs_sync (mp=0xf115d000, waitfor=2, cred=0xf0f21600, 
    p=0xf01cc250) at ./vnode_if.h:335
#19 0xf01272d2 in sync (p=0xf01cc250, uap=0x0, retval=0x0)
    at ../../kern/vfs_syscalls.c:336
#20 0xf018d915 in boot (howto=256) at ../../i386/i386/machdep.c:870
#21 0xf0112b53 in panic (fmt=0xf0194ddc "page fault")
    at ../../kern/subr_prf.c:116
#22 0xf01958de in trap_fatal (frame=0xefbffe48) at ../../i386/i386/trap.c:746
#23 0xf0195450 in trap_pfault (frame=0xefbffe48, usermode=0)
    at ../../i386/i386/trap.c:668
#24 0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -2147483648, 
      tf_esi = 137187328, tf_ebp = -272630120, tf_isp = -272630160, 
      tf_ebx = -265310128, tf_edx = -171352064, tf_ecx = -249858708, 
      tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8, 
      tf_eflags = 66118, tf_esp = -265902416, tf_ss = -2147483648})
    at ../../i386/i386/trap.c:308
#25 0xf018b2d1 in calltrap ()
#26 0xf0185c77 in vm_page_test_dirty (m=0xf026a6b0) at ../../vm/vm_page.c:1121
#27 0xf01831ce in _vm_object_page_clean (object=0xf11b6380, start=0, end=0, 
    syncio=1) at ../../vm/vm_object.c:584
#28 0xf0126d79 in vfs_msync (mp=0xf115d800, flags=2)
    at ../../kern/vfs_subr.c:1543
#29 0xf01272b4 in sync (p=0xf1137500, uap=0x0, retval=0x0)
    at ../../kern/vfs_syscalls.c:335
#30 0xf0122ad3 in vfs_update () at ../../kern/vfs_bio.c:1307
#31 0xf010653d in main (framep=0xefbfff88) at ../../kern/init_main.c:358

I know everyone on this list really wanted to know this much about the guts
of my machine; I sincerely apologize for spamming everyone and I hope that
I will in the future contribute enough to outweigh this inconvenience.

Later,
Jeff




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606170804.BAA14727>