Date: Sat, 8 May 2004 01:54:58 +0800 (KRAST) From: Eugene Grosbein <eugen@grosbein.pp.ru> To: FreeBSD-gnats-submit@FreeBSD.org Cc: sos@FreeBSD.org Subject: kern/66359: repeatable kernel panic during ata(4) device probe Message-ID: <200405071754.i47Hswu9000625@grosbein.pp.ru> Resent-Message-ID: <200405071800.i47I0ds5035742@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 66359 >Category: kern >Synopsis: repeatable kernel panic during ata(4) device probe >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri May 07 11:00:39 PDT 2004 >Closed-Date: >Last-Modified: >Originator: Eugene Grosbein >Release: FreeBSD 4.10-PRERELEASE i386 >Organization: Svyaz Service JSC >Environment: System: FreeBSD grosbein.pp.ru 4.10-PRERELEASE FreeBSD 4.10-PRERELEASE #12: Sat May 8 00:53:46 KRAST 2004 eu@grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV i386 Same for 4.9-RELEASE Same for 4.8-RELEASE >Description: I've found a way to make kernel panic 'reliably' during device probe stage. It works for 4.8, 4.8 and 4.10-PRE. I didn't tested 5-CURRENT as I do not have spare CURRENT box. Today my Maxtor 2B020H1 HDD failed - now it is detected by BIOS as Maxtor ATHENA. Size and parameters are detected correctly but the drive aborts all read requests. It seems that its internal service information is lost and the drive switched into special 'safe technological' mode itself and now should be repaired in a lab. I will not consider data recovery here. Better look how ata(4) behaves with the drive. Here is boot log: >How-To-Repeat: Take bad hard disk drive that refuses to read its first cylinder and boot ata(4)-enabled system. >Fix: Unknown for me. >Release-Note: >Audit-Trail: >Unformatted: >> FreeBSD/i386 BOOT Default: 1:ad(1,a)/boot/loader boot: -h Console: serial port BIOS drive A: is disk0 BIOS drive C: is disk1 BIOS drive D: is disk2 BIOS 639kB/523200kB available memory FreeBSD/i386 bootstrap loader, Revision 0.8 (eu@grosbein.pp.ru, Sun Apr 25 01:02:09 KRAST 2004) Loading /boot/defaults/loader.conf /kernel text=0x1eda60 data=0x41b74+0x23940 syms=[0x4+0x332e0+0x4+0x3a3e4] Hit [Enter] to boot immediately, or any other key for command prompt. Booting [kernel] in 1 second... Booting [kernel]... Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.10-PRERELEASE #12: Sat May 8 00:53:46 KRAST 2004 eu@grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV Timecounter "i8254" frequency 1193167 Hz CPU: Intel Celeron (902.03-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x68a Stepping = 10 Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 536805376 (524224K bytes) config> flags atkbd 0x0 config> quit sio0: gdb debugging port avail memory = 518496256 (506344K bytes) Preloaded elf kernel "kernel" at 0xc03c3000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc03c309c. VESA: v2.0, 65536k memory, flags:0x1, mode table:0xc034cac2 (1000022) VESA: ATI RADEON 9200 Pentium Pro MTRR support enabled Using $PIR table, 8 entries at 0xc00fdef0 apm0: <APM BIOS> on motherboard apm0: found APM BIOS v1.2, connected at v1.2 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard pci0: <PCI bus> on pcib0 agp0: <Intel 82443BX (440 BX) host to PCI bridge> mem 0xe8000000-0xebffffff at device 0.0 on pci0 pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <ATI model 5961 graphics accelerator> at 0.0 irq 11 pci1: <ATI model 5941 graphics accelerator> at 0.1 isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 ATA33 controller> port 0xf000-0xf00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: <Intel 82371AB/EB (PIIX4) USB controller> port 0xe000-0xe01f irq 10 at device 7.2 on pci0 usb0: <Intel 82371AB/EB (PIIX4) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uscanner0: Hewlett-Packard HP ScanJet 2200C, rev 1.10/1.00, addr 2 uhid0: American Power Conversion Back-UPS 500 FW: 6.5.I USB FW: c1, rev 1.10/1.00, addr 3, iclass 3/0 intpm0: <Intel 82371AB Power management controller> port 0x5000-0x500f irq 9 at device 7.3 on pci0 intpm0: I/O mapped 5000 intpm0: intr IRQ 9 enabled revision 0 smbus0: <System Management Bus> on intsmb0 smb0: <SMBus general purpose I/O> on smbus0 intpm0: PM I/O mapped 4000 fxp0: <Intel 82557 Pro/100 Ethernet> port 0xe400-0xe41f mem 0xef000000-0xef0fffff,0xef100000-0xef100fff irq 9 at device 16.0 on pci0 fxp0: Ethernet address 00:a0:c9:89:95:1f inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto orm0: <Option ROM> at iomem 0xc0000-0xccfff on isa0 pmtimer0 on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model NetMouse/NetScroll Optical, device ID 0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> on isa0 sc0: VGA <24 virtual consoles, flags=0x0> sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 pcm0: <Yamaha OPL-SAx> at port 0x220-0x22f,0x530-0x537,0x388-0x38f,0x330-0x331,0x370-0x371 irq 5 drq 0,1 on isa0 DUMMYNET initialized (011031) ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited IPsec: Initialized Security Association Processing. ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master UDMA33 ad1: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave UDMA33 ad1: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad1: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad1: READ command timeout tag=0 serv=0 - resetting ad1: trying fallback to PIO mode ata0: resetting devices .. done Fatal trap 12: page fault while in kernel mode fault virtual address = 0x80100004 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0157f97 stack pointer = 0x10:0xc02f0924 frame pointer = 0x10:0xc02f0924 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio trap number = 12 panic: page fault syncing disks... done Uptime: 40s Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... That was using next settings: hw.ata.ata_dma: 1 hw.ata.wc: 0 hw.ata.tags: 0 hw.ata.atapi_dma: 0 If I use hw.ata.ata_dma=0 then it boots right: ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master PIO4 ad1: hard error reading fsbn 40020561 of 0-3 (ad1 bn 40020561; cn 39702 tn 15 sn 0) status=51 error=04 ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave PIO4 And proceedes till multiuser. It is not possible to get crashdump in this situation so I used another machine and GDB over serial line. Here comes debug session script. It correspondes to the moment after 'Automatic reboot in 15 seconds' message. As you can see, fault virtual address 0x80100004 is the (junk) value of request->bp->b_dev in ad_interrupt(). And by the way, is it OK to call splx(s) and return without calling ATA_UNLOCK_CH(ch) in the ata-all.c:ata_start() inside '#if NATADISK > 0' block? Script started on Sat May 8 01:02:20 2004 GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf (kgdb) target remote /dev/cuaa0 Remote debugging using /dev/cuaa0 Debugger (msg=0xc02df569 "manual escape to debugger") at /usr/local/src/sys/i386/i386/db_interface.c:319 319 in_Debugger = 0; (kgdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. Debugger (msg=0xc02df569 "manual escape to debugger") at /usr/local/src/sys/i386/i386/db_interface.c:319 319 in_Debugger = 0; (kgdb) set radix 16 Input and output radices now set to decimal 16, hex 10, octal 20. (kgdb) bt full #0 Debugger (msg=0xc02df569 "manual escape to debugger") at /usr/local/src/sys/i386/i386/db_interface.c:319 msg = 0x26 <Address 0x26 out of bounds> in_Debugger = 0x1 #1 0xc026e56e in scgetc (sc=0xc034cd40, flags=0x1) at /usr/local/src/sys/dev/syscons/syscons.c:3198 scp = (scr_stat *) 0xc0346f80 tp = (struct tty *) 0x26 c = 0x86 this_scr = 0x1 f = 0xffffffff i = 0x86 #2 0xc026c182 in sccngetch (flags=0x0) at /usr/local/src/sys/dev/syscons/syscons.c:1565 fkey = {str = '\000' <repeats 15 times>, len = 0x0} fkeycp = 0x0 scp = (scr_stat *) 0xc0346f80 p = (u_char *) 0xc02e324f "" cur_mode = 0x1 s = 0xffffffff c = 0x100 #3 0xc026c002 in sccngetc (dev=0xc0332ea0) at /usr/local/src/sys/dev/syscons/syscons.c:1488 No locals. ---Type <return> to continue, or q <return> to quit--- #4 0xc017e85f in cngetc () at /usr/local/src/sys/kern/tty_cons.c:433 c = 0xc02e324f #5 0xc0164875 in shutdown_panic (junk=0x0, howto=0x100) at /usr/local/src/sys/kern/kern_shutdown.c:435 loop = 0x87 #6 0xc01647cd in boot (howto=0x100) at /usr/local/src/sys/kern/kern_shutdown.c:378 _el = (struct eventhandler_list *) 0x26 _ep = (struct eventhandler_entry *) 0xc1050e60 howto = 0x100 #7 0xc0164c11 in panic (fmt=0xc02e5c0c "%s") at /usr/local/src/sys/kern/kern_shutdown.c:656 fmt = 0xc02e5c0c "%s" bootopt = 0x100 buf = "page fault", '\000' <repeats 245 times> #8 0xc0280dc0 in trap_fatal (frame=0xc02f08fc, eva=0x80100004) at /usr/local/src/sys/i386/i386/trap.c:974 frame = (struct trapframe *) 0xc02f08fc eva = 0x26 code = 0x10 type = 0xc ss = 0x10 esp = 0x26 softseg = {ssd_base = 0x0, ssd_limit = 0xfffff, ssd_type = 0x1b, ---Type <return> to continue, or q <return> to quit--- ssd_dpl = 0x0, ssd_p = 0x1, ssd_xx = 0xf, ssd_xx1 = 0x2, ssd_def32 = 0x1, ssd_gran = 0x1} #9 0xc0280a55 in trap_pfault (frame=0xc02f08fc, usermode=0x0, eva=0x80100004) at /usr/local/src/sys/i386/i386/trap.c:867 va = 0x80100000 vm = (struct vmspace *) 0x26 map = 0xc rv = 0x20 ftype = 0x0 p = (struct proc *) 0x0 #10 0xc02805e7 in trap (frame={tf_fs = 0xc1e00010, tf_es = 0x10, tf_ds = 0x80010, tf_edi = 0xc1e1f200, tf_esi = 0xc1e27000, tf_ebp = 0xc02f093c, tf_isp = 0xc02f0928, tf_ebx = 0x80100000, tf_edx = 0xc1e70a40, tf_ecx = 0xc1e04b00, tf_eax = 0x80100000, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc0157f97, tf_cs = 0x8, tf_eflags = 0x90293, tf_esp = 0xc02f095c, tf_ss = 0xc02076ad}) at /usr/local/src/sys/i386/i386/trap.c:466 p = (struct proc *) 0x0 sticks = 0x1 i = 0x0 ucode = 0x0 type = 0xc code = 0x20 eva = 0x80100004 ---Type <return> to continue, or q <return> to quit--- #11 0xc0157f97 in minor (x=0x80100000) at /usr/local/src/sys/kern/kern_conf.c:198 x = 0x26 #12 0xc02076ad in diskerr (bp=0xc1e1f200, what=0xc02a0ccb "hard error", pri=0xffffffff, blkdone=0x262aa51, lp=0xc1e2716c) at /usr/local/src/sys/sys/disklabel.h:459 dev = 0x68c440 bp = (struct buf *) 0xc1e1f200 slice = 0x68c440 part = 0x0 partname = "\000 " sname = 0xc1e27000 "pKàÁ\001" sn = 0x68c440 #13 0xc01360eb in ad_interrupt (request=0xc1e70a40) at /usr/local/src/sys/dev/ata/ata-disk.c:609 adp = (struct ad_softc *) 0xc1e27000 dma_stat = 0xc02a0cbb #14 0xc012ca1d in ata_intr (data=0xc1e04b00) at /usr/local/src/sys/dev/ata/ata-all.c:607 ch = (struct ata_channel *) 0xc1e04b00 #15 0xc01cda0d in ipfw_tick (unused=0x0) at /usr/local/src/sys/netinet/ip_fw2.c:2722 i = 0x400000 s = 0xcc4bb500 ---Type <return> to continue, or q <return> to quit--- q = (ipfw_dyn_rule *) 0xc01cda0c #16 0xc0275353 in doreti_swi () No symbol table info available. (kgdb) frame e #14 0xc012ca1d in ata_intr (data=0xc1e04b00) at /usr/local/src/sys/dev/ata/ata-all.c:607 607 if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES) (kgdb) l 602 603 /* find & call the responsible driver to process this interrupt */ 604 switch (ch->active) { 605 #if NATADISK > 0 606 case ATA_ACTIVE_ATA: 607 if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES) 608 return; 609 break; 610 #endif 611 #if DEV_ATAPIALL (kgdb) frame d #13 0xc01360eb in ad_interrupt (request=0xc1e70a40) at /usr/local/src/sys/dev/ata/ata-disk.c:609 609 diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ? (kgdb) l 604 /* did any real errors happen ? */ 605 if ((adp->device->channel->status & ATA_S_ERROR) || 606 (request->flags & ADR_F_DMA_USED && dma_stat & ATA_BMSTAT_ERROR)) { 607 adp->device->channel->error = 608 ATA_INB(adp->device->channel->r_io, ATA_ERROR); 609 diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ? 610 "UDMA ICRC error" : "hard error", LOG_PRINTF, 611 request->blockaddr + (request->donecount / DEV_BSIZE), 612 &adp->disk.d_label); 613 (kgdb) p request->bp->b_dev $2 = 0x80100000
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200405071754.i47Hswu9000625>