Date: Thu, 8 Feb 1996 19:16:26 -0500 From: Esa Ahola <esa@mindspring.com> To: FreeBSD-gnats-submit@freebsd.org Subject: kern/1008: Daily crash while writing network backups to local tape Message-ID: <199602090016.TAA14805@firebrick.mindspring.com> Resent-Message-ID: <199602090020.QAA11743@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 1008 >Category: kern >Synopsis: Daily crash while writing network backups to local tape >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Feb 8 16:20:01 PST 1996 >Last-Modified: >Originator: Esa Ahola <esa@mindspring.com> >Organization: MindSpring Enterprises, Inc. >Release: FreeBSD 2.1-STABLE i386 (sup date 1/29) >Environment: Differences from 2.1R: --- GENERIC Wed Oct 25 13:29:51 1995 +++ FIREBRICK Thu Feb 8 17:31:53 1996 @@ -11,2 +11,2 @@ -ident GENERIC -maxusers 10 +ident FIREBRICK +maxusers 254 @@ -22 +22 @@ -options "SCSI_DELAY=15" #Be pessimistic about Joe SCSI device +options "SCSI_DELAY=5" #Be pessimistic about Joe SCSI device @@ -24,0 +25,6 @@ +options COMCONSOLE #prefer serial console to video console +options KTRACE #kernel tracing +options "CHILD_MAX=256" +options "OPEN_MAX=512" +options "MAXMEM=131072" +options "NMBCLUSTERS=4096" @@ -30 +36 @@ -config kernel root on wd0 +config kernel root on sd0 @@ -40,7 +46,7 @@ -controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr -disk wd0 at wdc0 drive 0 -disk wd1 at wdc0 drive 1 - -controller wdc1 at isa? port "IO_WD2" bio irq 15 vector wdintr -disk wd2 at wdc1 drive 0 -disk wd3 at wdc1 drive 1 +# controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr +# disk wd0 at wdc0 drive 0 +# disk wd1 at wdc0 drive 1 + +# controller wdc1 at isa? port "IO_WD2" bio irq 15 vector wdintr +# disk wd2 at wdc1 drive 0 +# disk wd3 at wdc1 drive 1 @@ -122 +128 @@ -pseudo-device pty 16 +pseudo-device pty 64 @@ -123,0 +130 @@ +pseudo-device bpfilter 4 Hardware: - ASUS P55TP4XE P133 - ASUS SC-2000 SCSI (2) - ZNYX fast ethernet dmesg: FreeBSD 2.1-STABLE #0: Wed Jan 31 01:49:20 EST 1996 root@firebrick.mindspring.com:/usr/src-stable/sys/compile/FIREBRICK CPU: 133-MHz Pentium 735\\90 or 815\\100 (Pentium-class CPU) Origin = "GenuineIntel" Id = 0x52b Stepping=11 Features=0x1bf<FPU,VME,PSE,MCE,CX8,APIC> real memory = 134217728 (131072K bytes) avail memory = 127660032 (124668K bytes) Probing for devices on PCI bus 0: chip0 <Intel 82437 (Triton)> rev 2 on pci0:0 chip1 <Intel 82371 (Triton)> rev 2 on pci0:7 ncr0 <ncr 53c810 scsi> rev 2 int a irq 12 on pci0:10 ncr0 waiting for scsi devices to settle (ncr0:0:0): "SEAGATE ST32550N 0014" type 0 fixed SCSI 2 sd0(ncr0:0:0): Direct-Access sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 2047MB (4194058 512 byte sectors) (ncr0:1:0): "SEAGATE ST32550N 0014" type 0 fixed SCSI 2 sd1(ncr0:1:0): Direct-Access sd1(ncr0:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 2047MB (4194058 512 byte sectors) (ncr0:4:0): "DEC DLT2000 8202" type 1 removable SCSI 2 st0(ncr0:4:0): Sequential-Access st0(ncr0:4:0): 200ns (5 Mb/sec) offset 8. density code 0x19, drive empty ncr1 <ncr 53c810 scsi> rev 2 int a irq 10 on pci0:11 ncr1 waiting for scsi devices to settle (ncr1:0:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2 sd2(ncr1:0:0): Direct-Access sd2(ncr1:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 4095MB (8388315 512 byte sectors) (ncr1:1:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2 sd3(ncr1:1:0): Direct-Access sd3(ncr1:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 4095MB (8388315 512 byte sectors) (ncr1:2:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2 sd4(ncr1:2:0): Direct-Access sd4(ncr1:2:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 4095MB (8388315 512 byte sectors) (ncr1:3:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2 sd5(ncr1:3:0): Direct-Access sd5(ncr1:3:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8. 4095MB (8388315 512 byte sectors) de0 <Digital DC21140 Fast Ethernet> rev 17 int a irq 11 on pci0:12 de0: ZNYX ZX34X DC21140 [10-100Mb/s] pass 1.1 Ethernet address 00:c0:95:f8:05:d8 de0: enabling 100baseTX UTP port Probing for devices on the ISA bus: scprobe: keyboard RESET failed fe sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> ed0 not found at 0x280 ed1 not found at 0x300 sio0 at 0x3f8-0x3ff irq 4 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A sio2 not found at 0x3e8 sio3 not found at 0x2e8 lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface lpt1 not found at 0xffffffff lpt2 not found at 0xffffffff mse0: wrong signature ff mse0 not found at 0x23c fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: NEC 72065B fd0: 1.44MB 3.5in uha0 not found at 0x330 aha0 not found at 0x330 aic0 not found at 0x340 nca0 not found at 0x1f88 nca1 not found at 0x350 sea0 not found wt0 not found at 0x300 mcd0: timeout getting status mcd0 not found at 0x300 mcd1: timeout getting status mcd1 not found at 0x340 matcdc0 not found at 0x230 scd0 not found at 0x230 ie0 not found at 0x360 ep0 not found at 0x300 ix0 not found at 0x300 le0: no board found at 0x300 le0 not found at 0x300 lnc0 not found at 0x280 lnc1 not found at 0x300 ze0 not found at 0x300 zp0 not found at 0x300 npx0 on motherboard npx0: INT 16 interface >Description: A lightly-loaded news server (but full newsfeed) crashes most nights while running backups from remote machines to local DLT tape drive. Crashes have only occurred while doing I/O to tape. Newsfeed activity or newsreader load doesn't seem to matter. crash.4: IdlePTD 208000 current pcb at 1f58c0 panic: page fault #0 boot (howto=256) at ../../i386/i386/machdep.c:894 894 dumppcb.pcb_ptd = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../i386/i386/machdep.c:894 #1 0xf01134c3 in panic (fmt=0xf01a2ecc "page fault") at ../../kern/subr_prf.c:124 #2 0xf01a39ce in trap_fatal (frame=0xefbffd04) at ../../i386/i386/trap.c:746 #3 0xf01a3540 in trap_pfault (frame=0xefbffd04, usermode=0) at ../../i386/i386/trap.c:668 #4 0xf01a31df in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -230151168, tf_esi = -221137920, tf_ebp = -272630412, tf_isp = -266940001, tf_ebx = -1073610748, tf_edx = -266940004, tf_ecx = -1073543014, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = -266940001, tf_cs = 8, tf_eflags = 66178, tf_esp = -267151551, tf_ss = -230151168}) at ../../i386/i386/trap.c:308 #5 0xf019937d in calltrap () #6 0xf016d19f in tulip_addr_filter (sc=0xf2482c00) at ../../pci/if_de.c:1847 #7 0xf01455d6 in ip_output (m0=0xf2d1b400, opt=0x0, ro=0xf2d5e9ac, flags=0, imo=0x0) at ../../netinet/ip_output.c:324 #8 0xf01494ee in tcp_output (tp=0xf2abe400) at ../../netinet/tcp_output.c:668 #9 0xf014a2e2 in tcp_usrreq (so=0xf2a1ba00, req=8, m=0x0, nam=0x0, control=0x0) at ../../netinet/tcp_usrreq.c:272 #10 0xf01207c7 in soreceive (so=0xf2a1ba00, paddr=0x0, uio=0xefbfff2c, mp0=0x0, controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:786 #11 0xf01158a9 in soo_read (fp=0xf2e29880, uio=0xefbfff2c, cred=0xf1c75000) at ../../kern/sys_socket.c:63 #12 0xf01146e7 in read (p=0xf2a74200, uap=0xefbfff94, retval=0xefbfff8c) at ../../kern/sys_generic.c:112 #13 0xf01a3c9b in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 9, tf_esi = -272639912, tf_ebp = -272640008, tf_isp = -272629788, tf_ebx = 5, tf_edx = 0, tf_ecx = 5, tf_eax = 3, tf_trapno = 514, tf_err = 514, tf_eip = 134865573, tf_cs = 31, tf_eflags = 514, tf_esp = -272656416, tf_ss = 39}) at ../../i386/i386/trap.c:906 #14 0xf01993cb in Xsyscall () #15 0x9aa3 in ?? () #16 0x4139 in ?? () #17 0x3e9b in ?? () #18 0x3896 in ?? () #19 0x3142 in ?? () #20 0x2d22 in ?? () #21 0x10d3 in ?? () crash.5: Copyright 1994 Free Software Foundation, Inc... IdlePTD 208000 current pcb at 1f58c0 panic: page fault #0 boot (howto=256) at ../../i386/i386/machdep.c:894 894 dumppcb.pcb_ptd = rcr3(); (kgdb) bt #0 boot (howto=256) at ../../i386/i386/machdep.c:894 #1 0xf01134c3 in panic (fmt=0xf01a2ecc "page fault") at ../../kern/subr_prf.c:124 #2 0xf01a39ce in trap_fatal (frame=0xefbffcd8) at ../../i386/i386/trap.c:746 #3 0xf01a3540 in trap_pfault (frame=0xefbffcd8, usermode=0) at ../../i386/i386/trap.c:668 #4 0xf01a31df in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -230151168, tf_esi = -238595840, tf_ebp = -272630488, tf_isp = 1963065219, tf_ebx = -230151168, tf_edx = -266932336, tf_ecx = 1017, tf_eax = 1963065219, tf_trapno = 12, tf_err = 0, tf_eip = 1963065219, tf_cs = 8, tf_eflags = 66050, tf_esp = -266902197, tf_ss = -230151168}) at ../../i386/i386/trap.c:308 #5 0xf019937d in calltrap () #6 0x7501ff83 in ?? () #7 0xf016f2c6 in ncr_complete (np=0xf2482c00, cp=0xf2d35680) at ../../pci/ncr.c:4317 #8 0xf01455d6 in ip_output (m0=0xf2d35680, opt=0x0, ro=0xf28ff82c, flags=0, imo=0x0) at ../../netinet/ip_output.c:324 #9 0xf01494ee in tcp_output (tp=0xf289c100) at ../../netinet/tcp_output.c:668 #10 0xf014a2e2 in tcp_usrreq (so=0xf2884300, req=8, m=0x0, nam=0x0, control=0x0) at ../../netinet/tcp_usrreq.c:272 #11 0xf01207c7 in soreceive (so=0xf2884300, paddr=0x0, uio=0xefbfff2c, mp0=0x0, controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:786 #12 0xf01158a9 in soo_read (fp=0xf2e12dc0, uio=0xefbfff2c, cred=0xf1c75000) at ../../kern/sys_socket.c:63 #13 0xf01146e7 in read (p=0xf2b25000, uap=0xefbfff94, retval=0xefbfff8c) at ../../kern/sys_generic.c:112 #14 0xf01a3c9b in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 9, tf_esi = -272639912, tf_ebp = -272640008, tf_isp = -272629788, tf_ebx = 5, tf_edx = 0, tf_ecx = 5, tf_eax = 3, tf_trapno = 514, tf_err = 514, tf_eip = 134865573, tf_cs = 31, tf_eflags = 514, tf_esp = -272656416, tf_ss = 39}) at ../../i386/i386/trap.c:906 #15 0xf01993cb in Xsyscall () #16 0x9aa3 in ?? () #17 0x4139 in ?? () #18 0x3e9b in ?? () #19 0x3896 in ?? () #20 0x3142 in ?? () #21 0x2d22 in ?? () #22 0x10d3 in ?? () A colleague commented: > Notice that both have this in common: > #7 0xf01455d6 in ip_output (m0=0xf2d1b400, opt=0x0, ro=0xf2d5e9ac, flags=0, > imo=0x0) at ../../netinet/ip_output.c:324 > Both are followed by a completely unrelated procedure call > (ncr_complete and tulip_addr_filter). Either they're interrupt > handlers or random jumps. tulip_addr_filter is called in two places: > to add or delete multicast addresses to the DEC21140 board's filter > list, and when resetting the board (which happens when initializing > it, when recovering from certain errors in the interrupt handler, when > changing the physical port, etc.). I would expect to see a function > on the call stack before it though, because tulip_addr_filter doesn't > seem to be an interrupt handler itself. > Here's the relevant part of ip_output.c: > sendit: > /* > * If small enough for interface, can just send directly. > */ > if ((u_short)ip->ip_len <= ifp->if_mtu) { > ip->ip_len = htons((u_short)ip->ip_len); > ip->ip_off = htons((u_short)ip->ip_off); > ip->ip_sum = 0; > ip->ip_sum = in_cksum(m, hlen); > 324: error = (*ifp->if_output)(ifp, m, > (struct sockaddr *)dst, ro->ro_rt); > goto done; > } > Perhaps ifp->if_output has been corrupted somehow? >How-To-Repeat: Run backups. :-\ >Fix: None. >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602090016.TAA14805>