Date: Mon, 17 Jun 1996 01:02:40 -0700 From: "Jeffrey D. Wheelhouse" <jdw@wwwi.com> To: freebsd-stable@freebsd.org Subject: Re: Trap 12/supervisor read, page not present Message-ID: <199606170804.BAA14727@voltimand.csd.wwwi.com>
next in thread | raw e-mail | index | archive | help
At 05:22 PM 6/16/96 -0700, you wrote:
> I've been running this same kernel code for the past 40 hours while running
>my "thrash" regression test (which consists of about a dozen parallel
>compiles, a fork-exec-exit endless loop, some filesystem traversal scripts,
>top, and about a half dozen other things that excercise networking and other
>parts of the system. I haven't had any problems. I'm running a variant of this
>code on wcarchive (ftp.cdrom.com) that is slightly older and doesn't contain
>all of the fixes, and it's been up now for 5 days (load is around 700 users
>much of the time).
> ...so I'm at a loss to explain your instability problems. It would help if
>you could describe the hardware you're using, your kernel configuration, and
>the kind of load that is on the machine.
Here is the machine:
ASUS P54NP EISA/PCI Dual-proc motherboard (1 90Mhz Pentium Processor)
2x32mb 70ns SIMMs
Adaptec AHA-2940W Controller
Quantum Empire 2100S (2gig)
Quantum Atlas 34300W (4gig, wide)
Brand X I/O IDE
Micropolis 1.5gig drive jumpered to act as a 500meg and a 1gig
because of controller age
3Com 3c509 (ISA, running 10BaseT)
Brand X Cirrus ISA video card
I'll replace any part except the SCSI disks and the RAM to make it work.
This machine previously ran Unixware 2.0 (sold to SCO, ugh), and Linux
(just didn't like Linux) under similar workloads without problems but
that by now means rules out some new failure. The Atlas is the only
new component because a Grand Prix died under the pressure of being news
spool.
Here is my kernel configuration:
machine "i386"
cpu "I386_CPU"
cpu "I486_CPU"
cpu "I586_CPU"
ident VOLTIMAND
maxusers 32
options MATH_EMULATE #Support for x87 emulation
options INET #InterNETworking
options FFS #Berkeley Fast Filesystem
options NFS #Network Filesystem
options "CD9660" #ISO 9660 Filesystem
options PROCFS #Process filesystem
options "COMPAT_43" #Compatible with BSD 4.3
options "SCSI_DELAY=5" #Be pessimistic about Joe SCSI device
options BOUNCE_BUFFERS #include support for DMA bounce buffers
options UCONSOLE #Allow users to grab the console
config kernel root on wd0
controller isa0
controller eisa0
controller pci0
controller fdc0 at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr
disk fd0 at fdc0 drive 0
disk fd1 at fdc0 drive 1
tape ft0 at fdc0 drive 2
controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr
disk wd0 at wdc0 drive 0
disk wd1 at wdc0 drive 1
controller wdc1 at isa? port "IO_WD2" bio irq 15 vector wdintr
disk wd2 at wdc1 drive 0
disk wd3 at wdc1 drive 1
controller ahc0
controller ahc1
controller scbus0
device sd0
device st0
device cd0 #Only need one of these, the code dynamically grows
device sc0 at isa? port "IO_KBD" tty irq 1 vector scintr
device npx0 at isa? port "IO_NPX" irq 13 vector npxintr
device sio0 at isa? port "IO_COM1" tty irq 4 vector siointr
device sio1 at isa? port "IO_COM2" tty irq 3 vector siointr
device lpt0 at isa? port? tty irq 7 vector lptintr
device ep0 at isa? port 0x300 net irq 5 vector epintr
pseudo-device loop
pseudo-device ether
pseudo-device log
pseudo-device bpfilter 1
pseudo-device pty 16
pseudo-device gzip # Exec gzipped a.out's
Basically I stripped everything I didn't need in the hopes
of expurgating the problem.
This machine runs a moderate newsserver. The problem appears to
be related to this server (INN 1.4Unoff4) because on the last reboot
fsck ate the active file, preventing news from starting and the
machine stayed up for the 10+ hours until I came home.
The machine also runs very very a low usage web server
(traffic < negligable), and smtp, pop, nfs, samba, and
dhcp servers for my local network (5-6 machines). It has
typically no users or just me logged on and crashes even when
no one is around to be using NFS/Samba. Any process except
news can be moved to another machine if it will help.
Like you point out, pushing a few newsfeeds around is nothing compared
to ftp.cdrom.com... I too am at a loss to explain this except as bad
hardware, but I don't know what hardware it would be. Don't think it's
RAM because they passed test and the crash has been consistant at the
same address. An issue with the 2940W? I did have to disable wide
transfers on the wide drive and sync negotiation for both (not sure
which fixed it) to make the narrow drive stop rebooting the SCSI bus
on every drive access, but that was a dozen sups and kernels ago,
haven't had time to play with it since.
Here is a dmesg from the last crash today (when it ate the active file):
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x0
fault code = supervisor read, page not present
instruction pointer = 0x8:0xf0193b22
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 4 (update)
interrupt mask = net tty bio
panic: page fault
Actually this text appears twice identically in the dmesg, but
I figured that was a glitch until I saw the backtrace.
Kernel nm:
f0193814 T _pmap_is_referenced
f01939a8 T _pmap_is_modified
f0193b70 T _pmap_clear_modify
f0193cc0 T _pmap_clear_reference
f0193e10 T _pmap_copy_on_write
And thanks to the -g kernel, I have a symbolic backtrace to make
this message even outrageously longer:
#0 boot (howto=260) at ../../i386/i386/machdep.c:911
#1 0xf0112b53 in panic (fmt=0xf0194ddc "page fault")
at ../../kern/subr_prf.c:116
#2 0xf01958de in trap_fatal (frame=0xefbff938) at ../../i386/i386/trap.c:746
#3 0xf0195450 in trap_pfault (frame=0xefbff938, usermode=0)
at ../../i386/i386/trap.c:668
#4 0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1073724352,
tf_esi = 137904128, tf_ebp = -272631416, tf_isp = -272631456,
tf_ebx = -265196140, tf_edx = -171352064, tf_ecx = -249858708,
tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8,
tf_eflags = 66118, tf_esp = -265408468, tf_ss = 4096})
at ../../i386/i386/trap.c:308
#5 0xf018b2d1 in calltrap ()
#6 0xf0185c77 in vm_page_test_dirty (m=0xf02e302c) at ../../vm/vm_page.c:1121
#7 0xf0121442 in brelse (bp=0xf3158eb0) at ../../kern/vfs_bio.c:469
#8 0xf0122a1e in biodone (bp=0xf3158eb0) at ../../kern/vfs_bio.c:1275
#9 0xf0167184 in scsi_done (xs=0xf11c1080) at ../../scsi/scsi_base.c:429
#10 0xf01b4c6c in ahc_done (ahc=0xf0f23000, scb=0xf11d8000)
at ../../i386/scsi/aic7xxx.c:1947
#11 0xf01b477d in ahc_intr (arg=0xf0f23000) at ../../i386/scsi/aic7xxx.c:1859
#12 0xf015f52b in ahc_pci_intr (arg=0xf0f23000) at ../../pci/aic7870.c:592
#13 0xf018c25d in Xresume10 ()
#14 0xf0122637 in biowait (bp=0xf314b0e0) at ../../kern/vfs_bio.c:1132
#15 0xf0120da7 in bread (vp=0xf1150600, blkno=96, size=8192, cred=0xffffffff,
bpp=0xefbffcd4) at ../../kern/vfs_bio.c:187
#16 0xf0170ee5 in ffs_update (ap=0xefbffcfc) at ../../ufs/ffs/ffs_inode.c:133
#17 0xf01741ba in ffs_fsync (ap=0xefbffd40) at ./vnode_if.h:850
#18 0xf01731a9 in ffs_sync (mp=0xf115d000, waitfor=2, cred=0xf0f21600,
p=0xf01cc250) at ./vnode_if.h:335
#19 0xf01272d2 in sync (p=0xf01cc250, uap=0x0, retval=0x0)
at ../../kern/vfs_syscalls.c:336
#20 0xf018d915 in boot (howto=256) at ../../i386/i386/machdep.c:870
#21 0xf0112b53 in panic (fmt=0xf0194ddc "page fault")
at ../../kern/subr_prf.c:116
#22 0xf01958de in trap_fatal (frame=0xefbffe48) at ../../i386/i386/trap.c:746
#23 0xf0195450 in trap_pfault (frame=0xefbffe48, usermode=0)
at ../../i386/i386/trap.c:668
#24 0xf01950ef in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -2147483648,
tf_esi = 137187328, tf_ebp = -272630120, tf_isp = -272630160,
tf_ebx = -265310128, tf_edx = -171352064, tf_ecx = -249858708,
tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -266781918, tf_cs = 8,
tf_eflags = 66118, tf_esp = -265902416, tf_ss = -2147483648})
at ../../i386/i386/trap.c:308
#25 0xf018b2d1 in calltrap ()
#26 0xf0185c77 in vm_page_test_dirty (m=0xf026a6b0) at ../../vm/vm_page.c:1121
#27 0xf01831ce in _vm_object_page_clean (object=0xf11b6380, start=0, end=0,
syncio=1) at ../../vm/vm_object.c:584
#28 0xf0126d79 in vfs_msync (mp=0xf115d800, flags=2)
at ../../kern/vfs_subr.c:1543
#29 0xf01272b4 in sync (p=0xf1137500, uap=0x0, retval=0x0)
at ../../kern/vfs_syscalls.c:335
#30 0xf0122ad3 in vfs_update () at ../../kern/vfs_bio.c:1307
#31 0xf010653d in main (framep=0xefbfff88) at ../../kern/init_main.c:358
I know everyone on this list really wanted to know this much about the guts
of my machine; I sincerely apologize for spamming everyone and I hope that
I will in the future contribute enough to outweigh this inconvenience.
Later,
Jeff
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606170804.BAA14727>
