From owner-freebsd-current Mon Jan 20 9:18: 9 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9310D37B401 for ; Mon, 20 Jan 2003 09:18:03 -0800 (PST) Received: from pinyon.org (quine.pinyon.org [65.101.5.249]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B13043F1E for ; Mon, 20 Jan 2003 09:18:02 -0800 (PST) (envelope-from rcarter@pinyon.org) Received: from quine.pinyon.org (localhost [127.0.0.1]) by pinyon.org (Postfix) with ESMTP id DDECDA for ; Mon, 20 Jan 2003 10:18:01 -0700 (MST) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: freebsd-current@freebsd.org Subject: Re: STABLE->CURRENT rl fails In-Reply-To: Message from Robert Watson of "Sun, 19 Jan 2003 20:37:09 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 20 Jan 2003 10:18:01 -0700 From: "Russell L. Carter" Message-Id: <20030120171801.DDECDA@pinyon.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, I've been playing with this some more. A constant ping with little other traffic works fine, as does interactively logging in to the system over ssh. Any "large" transfer by scp, or rsync, first wedges the rl0 interface. I can down it and then up it and repeat. After three times or so, the system wedges hard, no response to either ctl+alt+esc or ctl+alt+del, and I have to power cycle to get up again. I've built the world several times and a half dozen kernels, so everything else seems to work fine, and rl0 never hiccups on -stable. I usually see a "rl0: watchdog timeout" first, then the oversize frame errors start showing up. This is Friday's -current. After the last reboot, during the bgfsck so I guess it's not related to the rl problem, the system panicked and the trace was (copied from the screen) Memory modified after free 0xc3ee0000(65532) panic: Most recently used by bus Debugger("panic") Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db> trace Debugger(c04e0701,c057bc40,c04f6c1d,d5e92bc4,1 at Debugger+0x54 panic(c04f6c1d,c04d00d6,fffc,c3e7e774,0) at panic+0xab [at this point I'll stop typing in the numbers to save my sanity, lemme know if any are interesting, I'm keeping this trace up for a while] mtrash_ctor(,,,,) at mtrash_ctor+0x5d mtrash_fini(,,,,) at mtrash_fini+0x20 zone_drain,,,, at +0x239 zone_foreach(,,,,) at +0x45 uma_reclaim(,,,,) at +0x17 vm_pageout_scan(,,,,) at 0xb9 vm_pageout(,,,,) at 0x262 fork_exit(,,) at 0xc4 fork_trampoline() at 0x1a ---trap 0x1,eip = 0, esp = 0xd5e92d7c, ebp = 0 --- Still seeing the lock reversal complaint during boot. Setting hint.acpi.0.disabled="1" in device.hints has no effect on the problem. I'm happy to try just about anything to get better info, for instance, is there something to look for if I dropped into the debugger after the first time rl0 wedged, but before the whole system wedged tight? Way at the bottom is the original dmesg. Best, Russell : : On Fri, 17 Jan 2003, Russell L. Carter wrote: : : > rl0: discard oversize frame (ether type fbf7 flags 3 len 2992 > max 1514) : > rl0: discard oversize frame (ether type fbf7 flags 3 len 2992 > max 1514) : > rl0: discard oversize frame (ether type 2e3d flags 3 len 55442 > max 1514) : > rl0: discard oversize frame (ether type 904 flags 3 len 36106 > max 1514) : > : > Fatal trap 12: page fault while in kernel mode : > fault virtual address = 0x46 : > fault code = supervisor read, page not present : > instruction pointer = 0x8:0xc02f19c0 : > stack pointer = 0x10:0xd9344ca4 : > frame pointer = 0x10:0xd9344cbc : > code segment = base 0x0, limit 0xfffff, type 0x1b : > = DPL 0, pres 1, def32 1, gran 1 : > processor eflags = interrupt enabled, resume, IOPL = 0 : > current process = 832 (reboot) : > trap number = 12 : > panic: page fault : : I'm probably no good on the if_rl and ACPI issues, but I can give this one : a try. This panic is a NULL pointer dereference, apparently in the : shutdown. If this is reproduceable, here's what would be most helpful to : debug it: take a copy of the GENERIC kernel config, and make sure it : contains debugging symbols. I.e., the kernel configuration contains : "makeoptions DEBUG=-g". Configure kernel dumps using the dumpdev option : in /etc/rc.conf -- typically people use their swap device. You may : already have debugging symbols for your kernel if you're using GENERIC : from -current. When the crash occurs, a dump will be performed, and then : when you boot up next, it will be saved in /var/crash (assuming you have : room -- if it's smaller than your system memory, symlink to /usr/crash and : create an appropriate target directory). Finally, run: : : gdb /usr/obj/.../kernel.debug /var/crash/vmcore.0 : : (replace the first path with the path to your kernel target build : directory) : (replace the second path with the most recent kernel dump) : : Type in "backtrace" to generate a trace, and respond to this e-mail with : the trace. : : Another popular debugging option is to compile your kernel with "options : DDB" which will allow you access to the live kernel debugger, which can be : used to generate traces on a panic. However, that's most useful if you : have a serial console, and can copy/paste the results into an e-mail. : There should be a fair amount of information on kernel debugging in the : handbook if you need guidance on the details on how to do the above. : : Robert N M Watson FreeBSD Core Team, TrustedBSD Projects : robert@fledge.watson.org Network Associates Laboratories : : : : To Unsubscribe: send mail to majordomo@FreeBSD.org : with "unsubscribe freebsd-current" in the body of the message Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Fri Jan 17 08:55:47 MST 2003 root@chomsky.hq.pinyon.org:/usr/obj/usr/src-current/src/sys/GENERIC Preloaded elf kernel "/boot/kernel/kernel" at 0xc06c3000. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc06c30a8. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 1991920132 Hz CPU: Intel(R) Pentium(R) 4 CPU 2.00GHz (1991.92-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febf9ff real memory = 520093696 (496 MB) avail memory = 497917952 (474 MB) Initializing GEOMetry subsystem Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard ACPI-0625: *** Info: GPE Block0 defined as GPE0 to GPE15 ACPI-0625: *** Info: GPE Block1 defined as GPE16 to GPE31 Using $PIR table, 7 entries at 0xc00fdf50 Timecounter "ACPI-fast" frequency 3579545 Hz acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 acpi_cpu0: on acpi0 acpi_tz0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe8000000-0xe8ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) isab0: at device 2.0 on pci0 isa0: on isab0 ohci0: mem 0xe9000000-0xe9000fff irq 11 at device 2.2 on pci0 usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered ums0: Mitsumi Mitsumi Quick Scroll Mouse (USB), rev 1.00/1.05, addr 2, iclass 3/1 ums0: 3 buttons and Z dir. ohci1: mem 0xe9001000-0xe9001fff irq 10 at device 2.3 on pci0 usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered atapci0: port 0x1000-0x100f at device 2.5 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at device 2.7 (no driver attached) rl0: port 0x1400-0x14ff mem 0x6004800-0x60048ff irq 11 at device 10.0 on pci0 rl0: Realtek 8139B detected. Warning, this may be unstable in autoselect mode rl0: Ethernet address: 00:90:f5:12:59:3b miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: at device 11.0 (no driver attached) cbb0: mem 0x88000000-0x88000fff irq 5 at device 12.0 on pci0 cardbus0: on cbb0 pccard0: <16-bit PCCard bus> on cbb0 acpi_button0: on acpi0 acpi_button1: on acpi0 acpi_acad0: on acpi0 acpi_cmbat0: on acpi0 acpi_lid0: on acpi0 atkbdc0: port 0x64,0x60 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A acpi_ec0: port 0x66,0x62 on acpi0 orm0: