From owner-freebsd-current@FreeBSD.ORG Tue May 20 11:39:15 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 84E1937B401; Tue, 20 May 2003 11:39:15 -0700 (PDT) Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by mx1.FreeBSD.org (Postfix) with ESMTP id A48A743F85; Tue, 20 May 2003 11:39:14 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by attbi.com (rwcrmhc52) with ESMTP id <2003052018391305200lpqime>; Tue, 20 May 2003 18:39:14 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA64957; Tue, 20 May 2003 11:39:12 -0700 (PDT) Date: Tue, 20 May 2003 11:39:11 -0700 (PDT) From: Julian Elischer To: "Daniel C. Sobral" In-Reply-To: <3ECA23F7.8030806@tcoip.com.br> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE cc: Julian Elischer cc: CURRENT cc: Robert Watson Subject: Re: /dev/null and KSE panic 100% reproducible X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 May 2003 18:39:15 -0000 On Tue, 20 May 2003, Daniel C. Sobral wrote: > Or so it seems. If I do a make install in=20 > /usr/ports/emulators/linux_base, panic happens. Alas, since my first=20 > panic yesterday was KSE and was during portupgrade and I have no=20 > linux_base presently installed, I suspect this is what caused that first= =20 > panic. >=20 > Julian, since I now have a _reproducible_ KSE panic... what do you want= =20 > me to do? :-) >=20 > New backtrace, for your delight and enjoyment. >=20 > GNU gdb 5.2.1 (FreeBSD) > Copyright 2002 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you = are > welcome to change it and/or distribute copies of it under certain=20 > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for detail= s. > This GDB was configured as "i386-undermydesk-freebsd"... > panic: KSE not on run queue > panic messages: > --- > panic: No strategy on dev null responsible for buffer 0xc78084f8 this is odd. it is very hard to work out which panic is occuring first.. is it the dev null or the KSE panic? can you get a serial console so we can be sure about this? the stack trace you showed is in the context of a clock interrupt, (or at least, SOME interrupt). The possibility is that the clock interrupt is recalculating priorities but somehow it's happenning when the system is already messing up it's scheduling data.. Someone put the following code into kern/kern_switch.c /* * Only allow non system threads to run in panic * if they are the one we are tracing. (I think.. [JRE]) */ if (panicstr && ((td->td_proc->p_flag & P_SYSTEM) =3D=3D 0 && (td->td_flags & TDF_INPANIC) =3D=3D 0)) goto retry; It has the effect of throwing away threads that it has taken off teh run queue if we are in a panic. at a later time anything that goes through these threads will assume they are still on teh run queue and panic becasue they are not.. try the following: change it to: if (panicstr && ((td->td_proc->p_flag & P_SYSTEM) =3D=3D 0 && (td->td_flags & TDF_INPANIC) =3D=3D 0)) { =09=09/* note that it is no longer on the run queue */ =09=09TD_SET_CAN_RUN(td); =09 goto retry; =09} if it fails you may try TD_SET_SUSPENDED(td) instead, but I think this is better. >=20 >=20 > syncing disks, buffers remaining... 2230 2230 2230 2229 2229 2229 2229=20 > 2229 panic: KSE not on run queue > Uptime: 50m29s > Dumping 255 MB > ata0: resetting devices .. > done > 16 32 48 64 80 96 112 128 144 160 176[CTRL-C to abort] [CTRL-C to=20 > abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to= =20 > abort] 192 208 224 240Copyright (c) 1992-2003 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved= =2E > FreeBSD 5.1-BETA #30: Mon May 19 21:40:33 BRT 2003 > root@dcs:/usr/obj/usr/src/sys/DCS > Preloaded elf kernel "/boot/kernel/kernel" at 0xc04c0000. > Preloaded elf module "/boot/kernel/snd_cmi.ko" at 0xc04c0228. > Preloaded elf module "/boot/kernel/snd_pcm.ko" at 0xc04c02d4. > Preloaded elf module "/boot/kernel/mac_biba.ko" at 0xc04c0380. > Preloaded elf module "/boot/kernel/mac_mls.ko" at 0xc04c0430. > Preloaded elf module "/boot/kernel/acpi.ko" at 0xc04c04dc. > Timecounter "i8254" frequency 1193182 Hz > Timecounter "TSC" frequency 1007051981 Hz > CPU: Intel(R) Celeron(TM) CPU 1000MHz (1007.05-MHz=20 > 686-class CPU) > Origin =3D "GenuineIntel" Id =3D 0x6b1 Stepping =3D 1 > =20 > Features=3D0x383f9ff > real memory =3D 268353536 (255 MB) > avail memory =3D 255438848 (243 MB) > Security policy loaded: TrustedBSD MAC/Biba (mac_biba) > Security policy loaded: TrustedBSD MAC/MLS (mac_mls) > Pentium Pro MTRR support enabled > VESA: v3.0, 4096k memory, flags:0x1, mode table:0xc03ca722 (1000022) > VESA: NVidia > npx0: on motherboard > npx0: INT 16 interface > acpi0: on motherboard > pcibios: BIOS version 2.10 > Using $PIR table, 9 entries at 0xc00f12d0 > acpi0: power button is handled as a fixed feature programming model. > Timecounter "ACPI-fast" frequency 3579545 Hz > acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 > acpi_cpu0: on acpi0 > acpi_button0: on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > agp0: mem=20 > 0xf8000000-0xfbffffff at device 0.0 on pci0 > pcib1: at device 1.0 on pci0 > pcib1: could not get PCI interrupt routing table for \\_SB_.PCI0.AGP_ -= =20 > AE_NOT_FOUND > pci1: on pcib1 > pci1: at device 0.0 (no driver attached) > isab0: at device 4.0 on pci0 > isa0: on isab0 > atapci0: port 0xd800-0xd80f at device=20 > 4.1 on pci0 > ata0: at 0x1f0 irq 14 on atapci0 > ata1: at 0x170 irq 15 on atapci0 > uhci0: port 0xd400-0xd41f irq 5 at device=20 > 4.2 on pci0 > usb0: on uhci0 > usb0: USB revision 1.0 > uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 2 ports with 2 removable, self powered > uhub0: port error, restarting port 1 > uhub0: port error, giving up port 1 > ugen0: AKS eToken R2 2242, rev 1.00/1.00, addr 2 > uhub0: port error, restarting port 2 > uhub0: port error, giving up port 2 > uhci1: port 0xd000-0xd01f irq 5 at device=20 > 4.3 on pci0 > usb1: on uhci1 > usb1: USB revision 1.0 > uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub1: 2 ports with 2 removable, self powered > uhub1: port error, restarting port 1 > uhub1: port error, giving up port 1 > uhub1: port error, restarting port 2 > uhub1: port error, giving up port 2 > pcm0: port 0xb800-0xb8ff at device 5.0 on pci0 > pcib0: slot 5 INTA is routed to irq 5 > fxp0: port=20 > 0xb400-0xb43f mem 0xf3000000-0xf30fffff,0xf3800000-0xf3800fff irq 10 at= =20 > device 9.0 on pci0 > fxp0: Ethernet address 00:02:b3:ae:0d:ea > miibus0: on fxp0 > inphy0: on miibus0 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > fdc0: port=20 > 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0 > fdc0: FIFO enabled, 8 bytes threshold > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > ppc0 port 0x378-0x37f irq 7 on acpi0 > ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode > ppbus0: on ppc0 > lpt0: on ppbus0 > lpt0: Interrupt-driven port > ppi0: on ppbus0 > sio0 port 0x3f8-0x3ff irq 4 on acpi0 > sio0: type 16550A > sio1 port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > atkbdc0: port 0x64,0x60 irq 1 on acpi0 > atkbd0: flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > psm0: irq 12 on atkbdc0 > psm0: model Generic PS/2 mouse, device ID 0 > orm0: