Date: Mon, 20 Jan 2003 10:18:01 -0700 From: "Russell L. Carter" <rcarter@pinyon.org> To: freebsd-current@freebsd.org Subject: Re: STABLE->CURRENT rl fails Message-ID: <20030120171801.DDECDA@pinyon.org> In-Reply-To: Message from Robert Watson <rwatson@freebsd.org> of "Sun, 19 Jan 2003 20:37:09 EST." <Pine.NEB.3.96L.1030119202919.67385I-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Ok, I've been playing with this some more. A constant ping with little other traffic works fine, as does interactively logging in to the system over ssh. Any "large" transfer by scp, or rsync, first wedges the rl0 interface. I can down it and then up it and repeat. After three times or so, the system wedges hard, no response to either ctl+alt+esc or ctl+alt+del, and I have to power cycle to get up again. I've built the world several times and a half dozen kernels, so everything else seems to work fine, and rl0 never hiccups on -stable. I usually see a "rl0: watchdog timeout" first, then the oversize frame errors start showing up. This is Friday's -current. After the last reboot, during the bgfsck so I guess it's not related to the rl problem, the system panicked and the trace was (copied from the screen) Memory modified after free 0xc3ee0000(65532) panic: Most recently used by bus Debugger("panic") Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db> trace Debugger(c04e0701,c057bc40,c04f6c1d,d5e92bc4,1 at Debugger+0x54 panic(c04f6c1d,c04d00d6,fffc,c3e7e774,0) at panic+0xab [at this point I'll stop typing in the numbers to save my sanity, lemme know if any are interesting, I'm keeping this trace up for a while] mtrash_ctor(,,,,) at mtrash_ctor+0x5d mtrash_fini(,,,,) at mtrash_fini+0x20 zone_drain,,,, at +0x239 zone_foreach(,,,,) at +0x45 uma_reclaim(,,,,) at +0x17 vm_pageout_scan(,,,,) at 0xb9 vm_pageout(,,,,) at 0x262 fork_exit(,,) at 0xc4 fork_trampoline() at 0x1a ---trap 0x1,eip = 0, esp = 0xd5e92d7c, ebp = 0 --- Still seeing the lock reversal complaint during boot. Setting hint.acpi.0.disabled="1" in device.hints has no effect on the problem. I'm happy to try just about anything to get better info, for instance, is there something to look for if I dropped into the debugger after the first time rl0 wedged, but before the whole system wedged tight? Way at the bottom is the original dmesg. Best, Russell : : On Fri, 17 Jan 2003, Russell L. Carter wrote: : : > rl0: discard oversize frame (ether type fbf7 flags 3 len 2992 > max 1514) : > rl0: discard oversize frame (ether type fbf7 flags 3 len 2992 > max 1514) : > rl0: discard oversize frame (ether type 2e3d flags 3 len 55442 > max 1514) : > rl0: discard oversize frame (ether type 904 flags 3 len 36106 > max 1514) : > : > Fatal trap 12: page fault while in kernel mode : > fault virtual address = 0x46 : > fault code = supervisor read, page not present : > instruction pointer = 0x8:0xc02f19c0 : > stack pointer = 0x10:0xd9344ca4 : > frame pointer = 0x10:0xd9344cbc : > code segment = base 0x0, limit 0xfffff, type 0x1b : > = DPL 0, pres 1, def32 1, gran 1 : > processor eflags = interrupt enabled, resume, IOPL = 0 : > current process = 832 (reboot) : > trap number = 12 : > panic: page fault : : I'm probably no good on the if_rl and ACPI issues, but I can give this one : a try. This panic is a NULL pointer dereference, apparently in the : shutdown. If this is reproduceable, here's what would be most helpful to : debug it: take a copy of the GENERIC kernel config, and make sure it : contains debugging symbols. I.e., the kernel configuration contains : "makeoptions DEBUG=-g". Configure kernel dumps using the dumpdev option : in /etc/rc.conf -- typically people use their swap device. You may : already have debugging symbols for your kernel if you're using GENERIC : from -current. When the crash occurs, a dump will be performed, and then : when you boot up next, it will be saved in /var/crash (assuming you have : room -- if it's smaller than your system memory, symlink to /usr/crash and : create an appropriate target directory). Finally, run: : : gdb /usr/obj/.../kernel.debug /var/crash/vmcore.0 : : (replace the first path with the path to your kernel target build : directory) : (replace the second path with the most recent kernel dump) : : Type in "backtrace" to generate a trace, and respond to this e-mail with : the trace. : : Another popular debugging option is to compile your kernel with "options : DDB" which will allow you access to the live kernel debugger, which can be : used to generate traces on a panic. However, that's most useful if you : have a serial console, and can copy/paste the results into an e-mail. : There should be a fair amount of information on kernel debugging in the : handbook if you need guidance on the details on how to do the above. : : Robert N M Watson FreeBSD Core Team, TrustedBSD Projects : robert@fledge.watson.org Network Associates Laboratories : : : : To Unsubscribe: send mail to majordomo@FreeBSD.org : with "unsubscribe freebsd-current" in the body of the message Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Fri Jan 17 08:55:47 MST 2003 root@chomsky.hq.pinyon.org:/usr/obj/usr/src-current/src/sys/GENERIC Preloaded elf kernel "/boot/kernel/kernel" at 0xc06c3000. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc06c30a8. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 1991920132 Hz CPU: Intel(R) Pentium(R) 4 CPU 2.00GHz (1991.92-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febf9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV, PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM> real memory = 520093696 (496 MB) avail memory = 497917952 (474 MB) Initializing GEOMetry subsystem Pentium Pro MTRR support enabled npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <PTLTD RSDT > on motherboard ACPI-0625: *** Info: GPE Block0 defined as GPE0 to GPE15 ACPI-0625: *** Info: GPE Block1 defined as GPE16 to GPE31 Using $PIR table, 7 entries at 0xc00fdf50 Timecounter "ACPI-fast" frequency 3579545 Hz acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 acpi_cpu0: <CPU> on acpi0 acpi_tz0: <thermal zone> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <SIS Generic host to PCI bridge> mem 0xe8000000-0xe8ffffff at device 0.0 on pci0 pcib1: <PCIBIOS PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) isab0: <PCI-ISA bridge> at device 2.0 on pci0 isa0: <ISA bus> on isab0 ohci0: <SiS 5571 USB controller> mem 0xe9000000-0xe9000fff irq 11 at device 2.2 on pci0 usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <SiS 5571 USB controller> on ohci0 usb0: USB revision 1.0 uhub0: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered ums0: Mitsumi Mitsumi Quick Scroll Mouse (USB), rev 1.00/1.05, addr 2, iclass 3/1 ums0: 3 buttons and Z dir. ohci1: <SiS 5571 USB controller> mem 0xe9001000-0xe9001fff irq 10 at device 2.3 on pci0 usb1: OHCI version 1.0, legacy support usb1: <SiS 5571 USB controller> on ohci1 usb1: USB revision 1.0 uhub1: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered atapci0: <SiS 5591 ATA100 controller> port 0x1000-0x100f at device 2.5 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: <multimedia, audio> at device 2.7 (no driver attached) rl0: <RealTek 8139 10/100BaseTX> port 0x1400-0x14ff mem 0x6004800-0x60048ff irq 11 at device 10.0 on pci0 rl0: Realtek 8139B detected. Warning, this may be unstable in autoselect mode rl0: Ethernet address: 00:90:f5:12:59:3b miibus0: <MII bus> on rl0 rlphy0: <RealTek internal media interface> on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: <serial bus, FireWire> at device 11.0 (no driver attached) cbb0: <TI1410 PCI-CardBus Bridge> mem 0x88000000-0x88000fff irq 5 at device 12.0 on pci0 cardbus0: <CardBus bus> on cbb0 pccard0: <16-bit PCCard bus> on cbb0 acpi_button0: <Power Button> on acpi0 acpi_button1: <Sleep Button> on acpi0 acpi_acad0: <AC adapter> on acpi0 acpi_cmbat0: <Control method Battery> on acpi0 acpi_lid0: <Control Method Lid Switch> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A acpi_ec0: <embedded controller> port 0x66,0x62 on acpi0 orm0: <Option ROMs> at iomem 0xdc000-0xdffff,0xc0000-0xcbfff on isa0 pmtimer0 on isa0 fdc0: cannot reserve I/O port range (6 ports) ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 10.000 msec acpi_cpu: CPU throttling available, 8 steps from 100% to 12.5% ad0: 28615MB <TOSHIBA MK3017GAP> [58140/16/63] at ata0-master UDMA100 acd0: CD-RW <DW-28E> at ata1-master PIO4 Mounting root from ufs:/dev/ad0s1a lock order reversal 1st 0xc4013068 process lock (process lock) @ /usr/src-current/src/sys/kern/ker n_descrip.c:2100 2nd 0xc4030134 filedesc structure (filedesc structure) @ /usr/src-current/src/sys/kern/kern_descrip.c:2107 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030120171801.DDECDA>