Date: Thu, 13 Jan 2000 23:35:22 +1100 From: Tony Frank <tfrank@eric.net.au> To: freebsd-questions@freebsd.org Subject: 3.4-STABLE crashes intermittently - need suggestions Message-ID: <20000113233522.A1035@random.n2-au>
next in thread | raw e-mail | index | archive | help
Hi, I have a system running at 3.4-STABLE that seems to occasionally freeze or hang for no obvious reason. What I'm after are some pointers on what to do to try and isolate the problem, hopefully with the end result being a system that stays up for more than about 3 days at a time. I will describe the symptoms, my configuration, and also include my attempts at a post mortem below... The problem: Most of the time the system just appears to freeze, with no error message on the console - at this time the IDE HD LED is usually on solid (however no audiable disk activity) the keyboard offers no response, the system no longer responds to any form of network probing (arp, ping etc) and requires physical intervention in the form of a power on/off (no reset button) This seems to occur most often when the system is under high load, but also appears to occur when system is mostly idle. I can do a make world (and also make -j4 world) with no problems, which indicates to me that the hardware should be pretty much ok. I have not swapped the various components around or tried parts from other systems in there, but I am reasonably confident in the hardware - certainly the harddisk and ethernet cards were working in my other PC for 6months+ with no problems. The most recent time it occured, I had some moderate NFS traffic (cvs update across 10mb ethernet, frozen system as NFS server) and at the same time some fairly heavy local disk activity (local cvs update from local repository) and also routing light http traffic to/from ISP (multilink userppp - 2x56k modems at about 8k/s sustained traffic) After the reboot, there was no coredump generated. I have the following output of an attempt to debug a panic from several days ago(my first time attempting to use gdb so I didn't get very far) I basically followed the example in the faq/handbook but didn't know what to make of the output... kernel post mortem output: 17:08:18 tony@random (/usr/src/sys/compile/RAND)$ gdb -kernel kernel.debug /var /crash/vmcore.5 17:10:19 tony@random (/usr/src/sys/compile/RAND)$ gdb -kernel GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd". (kgdb) symbol-file kernel.debug Reading symbols from kernel.debug...done. (kgdb) exec-file /var/crash/kernel.5 (kgdb) core-file /var/crash/vmcore.5 IdlePTD 2670592 initial pcb at 222250 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0x0 stack pointer = 0x10:0xc33b2acc frame pointer = 0x10:0xc33b2ad4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 167 (nfsd) interrupt mask = trap number = 12 panic: page fault syncing disks... Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0x0 stack pointer = 0x10:0xc33b28d0 frame pointer = 0x10:0xc33b28d8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 167 (nfsd) interrupt mask = bio trap number = 12 panic: page fault dumping to dev 20001, offset 88064 dump 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 boot (howto=260) at ../../kern/kern_shutdown.c:285 285 dumppcb.pcb_cr3 = rcr3(); (kgdb) where #0 boot (howto=260) at ../../kern/kern_shutdown.c:285 #1 0xc012c1b8 in at_shutdown ( function=0xc020a6de <__set_sysinit_set_sym_memdev_sys_init+1050>, arg=0xc3376c80, queue=-1019762752) at ../../kern/kern_shutdown.c:446 #2 0xc01e6099 in trap_fatal (frame=0xc33b2894, eva=0) at ../../i386/i386/trap.c:942 #3 0xc01e5d77 in trap_pfault (frame=0xc33b2894, usermode=0, eva=0) at ../../i386/i386/trap.c:835 #4 0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1053105680, tf_esi = -1019799296, tf_ebp = -1019533096, tf_isp = -1019533124, tf_ebx = 6144, tf_edx = -1019533040, tf_ecx = 40, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66198, tf_esp = -1072316847, tf_ss = -1019533040}) at ../../i386/i386/trap.c:437 #5 0x0 in ?? () (kgdb) up 4 #4 0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1053105680, tf_esi = -1019799296, tf_ebp = -1019533096, tf_isp = -1019533124, tf_ebx = 6144, tf_edx = -1019533040, tf_ecx = 40, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66198, tf_esp = -1072316847, tf_ss = -1019533040}) at ../../i386/i386/trap.c:437 437 (void) trap_pfault(&frame, FALSE, eva); (kgdb) frame frame->tf_ebp frame->tf_eip #0 0x0 in ?? () (kgdb) list 432 #endif 433 /* kernel trap */ 434 435 switch (type) { 436 case T_PAGEFLT: /* page fault */ 437 (void) trap_pfault(&frame, FALSE, eva); 438 return; 439 440 case T_DNA: 441 #if NNPX > 0 (kgdb) up #1 0xc012bf42 in boot (howto=-1019533040) at ../../kern/kern_shutdown.c:287 287 dumpsys(); (kgdb) up #2 0xc01be691 in ufs_vnoperatespec (ap=0xc33b2910) at ../../ufs/ufs/ufs_vnops.c:2318 2318 return (VOCALL(ufs_specop_p, ap->a_desc->vdesc_offset, ap)); (kgdb) up #3 0xc014a3a3 in vfs_bio_awrite (bp=0xc13ae1f0) at vnode_if.h:1145 1145 return (VCALL((bp)->b_vp, VOFFSET(vop_bwrite), &a)); (kgdb) up #4 0xc01b85ea in ffs_fsync (ap=0xc33b2998) at ../../ufs/ffs/ffs_vnops.c:205 205 vfs_bio_awrite(bp); (kgdb) up #5 0xc01b6a93 in ffs_sync (mp=0xc0772600, waitfor=2, cred=0xc04da680, p=0xc0236f94) at vnode_if.h:499 499 return (VCALL(vp, VOFFSET(vop_fsync), &a)); (kgdb) up #6 0xc015291f in sync (p=0xc0236f94, uap=0x0) at ../../kern/vfs_syscalls.c:549 549 VFS_SYNC(mp, MNT_NOWAIT, (kgdb) up #7 0xc012bd79 in boot (howto=256) at ../../kern/kern_shutdown.c:203 203 sync(&proc0, NULL); (kgdb) up #8 0xc012c1b8 in at_shutdown ( function=0xc020a6de <__set_sysinit_set_sym_memdev_sys_init+1050>, arg=0xc3376c80, queue=-1019762752) at ../../kern/kern_shutdown.c:446 446 boot(bootopt); (kgdb) up #9 0xc01e6099 in trap_fatal (frame=0xc33b2a90, eva=0) at ../../i386/i386/trap.c:942 942 panic(trap_msg[type]); (kgdb) up #10 0xc01e5d77 in trap_pfault (frame=0xc33b2a90, usermode=0, eva=0) at ../../i386/i386/trap.c:835 835 trap_fatal(frame, eva); (kgdb) up #11 0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1063922432, tf_esi = 0, tf_ebp = -1019532588, tf_isp = -1019532616, tf_ebx = -1053129064, tf_edx = -1019532548, tf_ecx = 34, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66071, tf_esp = -1072316847, tf_ss = -1019532548}) at ../../i386/i386/trap.c:437 437 (void) trap_pfault(&frame, FALSE, eva); (kgdb) up #12 0x0 in ?? () (kgdb) up Initial frame selected; you cannot go up. (kgdb) list 432 #endif 433 /* kernel trap */ 434 435 switch (type) { 436 case T_PAGEFLT: /* page fault */ 437 (void) trap_pfault(&frame, FALSE, eva); 438 return; 439 440 case T_DNA: 441 #if NNPX > 0 (kgdb) quit System details: Hardware: IBM PC340 (Intel p100/32mbRAM/3G IDE HDD) Also Netgear FA310TX(pn0) and NE2000(ed0), and IBM Auto 16/4 Token Ring card (not used/probed etc) Software: 3.4-STABLE, kernel based on GENERIC with everything but pn0 and ed0 removed and BRIDGE+SOFTUPDATES added (see included file at end) Also, I patched the pn0 driver to support bridging, however the problem appears to exist whether I apply this patch or not. So far the bridging seems to work as I would expect it to, but I may be missing something here too... Dmesg output and kernel config are included below, along with the patch for pn0. *** start --- if_pn.c.original Thu Jan 13 22:41:52 2000 +++ if_pn.c Thu Jan 13 22:43:12 2000 @@ -77,6 +77,11 @@ #include <net/bpf.h> #endif +#include "opt_bdg.h" +#ifdef BRIDGE +#include <net/bridge.h> +#endif + #include <vm/vm.h> /* for vtophys */ #include <vm/pmap.h> /* for vtophys */ #include <machine/clock.h> /* for DELAY */ @@ -1586,6 +1591,24 @@ } } #endif + +#ifdef BRIDGE + + /* Copied from if_xl.c and placed in about the same spot */ + + if (do_bridge) { + struct ifnet *bdg_ifp; + bdg_ifp = bridge_in(m); + if (bdg_ifp != BDG_LOCAL && bdg_ifp != BDG_DROP) + bdg_forward(&m, bdg_ifp); + if (((bdg_ifp != BDG_LOCAL) && (bdg_ifp != BDG_BCAST) && + (bdg_ifp != BDG_MCAST)) || bdg_ifp == BDG_DROP) { + m_freem(m); + continue; + } + } +#endif + /* Remove header from mbuf and pass it on. */ m_adj(m, sizeof(struct ether_header)); ether_input(ifp, eh, m); *** end dmesg output: Copyright (c) 1992-1999 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 3.4-STABLE #0: Thu Jan 13 21:18:29 EST 2000 tony@random.n2-au:/usr1/src/sys/compile/RAND Timecounter "i8254" frequency 1193182 Hz CPU: Pentium/P54C (99.47-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x525 Stepping = 5 Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8> real memory = 33554432 (32768K bytes) avail memory = 30281728 (29572K bytes) Preloaded elf kernel "kernel" at 0xc0278000. Probing for devices on PCI bus 0: chip0: <Host to PCI bridge (vendor=1039 device=5511)> rev 0x00 on pci0.0.0 chip1: <SiS 85c503> rev 0x01 on pci0.1.0 ide_pci0: <PCI IDE controller (busmaster capable)> rev 0x08 int a irq 0 on pci0.1.1 pn0: <82c169 PNIC 10/100BaseTX> rev 0x21 int a irq 10 on pci0.14.0 pn0: Ethernet address: 00:a0:cc:3c:d1:bf pn0: autoneg complete, link status good (half-duplex, 10Mbps) vga0: <Cirrus Logic GD5436 SVGA controller> rev 0x00 on pci0.20.0 Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> ed0 at 0x340-0x35f irq 5 on isa ed0: address 00:40:c7:11:c5:62, type NE2000 (16 bit) atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa psm0 not found sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A wdc0 at 0x1f0-0x1f7 irq 14 flags 0xa0ffa0ff on isa wdc0: unit 0 (wd0): <FUJITSU MPC3032AT>, 32-bit, multi-block-16 wd0: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S ppc0 at 0x378 irq 7 flags 0x40 on isa ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/15 bytes threshold lpt0: <generic printer> on ppbus 0 lpt0: Interrupt-driven port ppi0: <generic parallel i/o> on ppbus 0 plip0: <PLIP network interface> on ppbus 0 vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa npx0 on motherboard npx0: INT 16 interface apm0 flags 0x31 on isa apm: found APM BIOS version 1.2 Intel Pentium detected, installing workaround for F00F bug BRIDGE 990810, have 6 interfaces -- index 1 type 6 phy 0 addrl 6 addr 00.a0.cc.3c.d1.bf -- index 2 type 6 phy 0 addrl 6 addr 00.40.c7.11.c5.62 changing root device to wd0s1a WARNING: / was not properly dismounted Kernel Config: machine "i386" cpu "I486_CPU" cpu "I586_CPU" ident RAND maxusers 32 makeoptions DEBUG="-g" options SOFTUPDATES options INET #InterNETworking options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options NFS #Network Filesystem options PROCFS #Process filesystem options "COMPAT_43" #Compatible with BSD 4.3 [KEEP THIS!] options UCONSOLE #Allow users to grab the console options FAILSAFE #Be conservative options USERCONFIG #boot -c editor options KTRACE #ktrace(1) syscall trace support options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores config kernel root on wd0 controller isa0 controller pci0 # IDE controller and disks controller wdc0 at isa? flags 0xa0ffa0ff port "IO_WD1" bio irq 14 disk wd0 at wdc0 drive 0 # atkbdc0 controls both the keyboard and the PS/2 mouse controller atkbdc0 at isa? port IO_KBD tty device atkbd0 at isa? tty irq 1 device psm0 at isa? tty irq 12 device vga0 at isa? port ? conflicts # splash screen/screen saver pseudo-device splash # syscons is the default console driver, resembling an SCO console device sc0 at isa? tty # Floating point support - do not disable. device npx0 at isa? port IO_NPX irq 13 # Power management support (see LINT for more options) device apm0 at isa? flags 0x31 # Advanced Power Management # Serial (COM) ports device sio0 at isa? port "IO_COM1" flags 0x10 tty irq 4 device sio1 at isa? port "IO_COM2" tty irq 3 # Parallel port device ppc0 at isa? port? flags 0x40 net irq 7 controller ppbus0 # Parallel port bus (required) device lpt0 at ppbus? # Printer device plip0 at ppbus? # TCP/IP over parallel device ppi0 at ppbus? # Parallel port interface device options BRIDGE # ISA Ethernet NICs. device ed0 at isa? port 0x340 net irq 5 iomem 0xd8000 # PCI Ethernet NICs. device pn0 # Pseudo devices - the number indicates how many units to allocated. pseudo-device loop # Network loopback pseudo-device ether # Ethernet support pseudo-device tun 2 # User-PPP pseudo-device pty 16 # Pseudo-ttys (telnet etc) pseudo-device gzip # Exec gzipped a.out's pseudo-device vn pseudo-device bpfilter 4 #Berkeley packet filter -- Tony Frank _ __ ___ ___ ___ tfrank@eric.net.au _ __ ___ | _ ) __| \ http://www.freebsd.org/ _ __ ___ ____ | _ \__ \ |) | FreeBSD: The Power to Serve! _ __ ___ ____ _____ |___/___/___/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000113233522.A1035>