Date: 26 Jun 2003 19:49:52 -0000 From: Nathan Gardner <nathan@inwa.net> To: FreeBSD-gnats-submit@FreeBSD.org Cc: support@inwa.net Subject: i386/54033: Disk lockup. Message-ID: <20030626194952.90047.qmail@eclipse.inwa.net> Resent-Message-ID: <200307021720.h62HK00E069266@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 54033 >Category: i386 >Synopsis: Disk lockup. >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Jul 02 10:20:00 PDT 2003 >Closed-Date: >Last-Modified: >Originator: Nathan Gardner >Release: FreeBSD 4.7-RELEASE i386 >Organization: InWa.net >Environment: System: FreeBSD eclipse 4.7-RELEASE FreeBSD 4.7-RELEASE #1: Tue Jun 24 16:57:46 PDT 2003 nathan@host.inwa.net:/usr/src/sys/compile/eclipse.new i386 host# atacontrol list ATA channel 0: Master: ad0 <Maxtor 6Y120L0/YAR41VW0> ATA/ATAPI rev 7 Slave: no device present ATA channel 1: Master: acd0 <PLEXTOR CD-R PX-320A/1.01> ATA/ATAPI rev 0 Slave: no device present ATA channel 2: Master: ad4 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5 Slave: ad5 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5 ATA channel 3: Master: no device present Slave: no device present host# >Description: After about two weeks processes accessing the single hard disk (/dev/ad0) hang. ps reports that the process status as D for disk write. The processes can not be killed, and do not finish. If the process is system critical this means that the system crashes, and if it is not, then the system needs to be rebooted before the processes will go away and the disk will become accessible again. When the system comes back up, the drive seems to work fine, and I have not found anything in my logs to show what the cause of the problem might be. I had been using the drive for swap and backup (via tar), but after several crashes, I moved swap off the drive. Now it just does backups, which do not cause the whole system to crash. The current drive is the second hard drive I have tried. The first one was also a Maxtor drive (although it was an 80GB drive whereas the new one is 120GB). I assumed that this error was hardware related, so I swapped it out, and brought it home for testing. The manufacturers tests show that there is nothing wrong with the drive. I have not, as yet, been able to try the system with a drive by a different manufacturer. Because the processes hang in disk writes, I haven't been able to figure out what is causing the crash with any more certainty. Once one process hangs, any other processes that try to access the drive do the same. >How-To-Repeat: Use Maxtor drive regularly for a couple weeks for backups. Watch hang, reboot. Contents of /var/run/dmesg.boot --begin-- Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.7-RELEASE #1: Tue Jun 24 16:57:46 PDT 2003 nathan@host.inwa.net:/usr/src/sys/compile/eclipse.new Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (1399.33-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b1 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 1073676288 (1048512K bytes) avail memory = 1040371712 (1015988K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00178011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0395000. ccd0-3: Concatenated disk drivers Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 8 entries at 0xc00fdc60 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Host to PCI bridge> on motherboard IOAPIC #0 intpin 11 -> irq 2 pci0: <PCI bus> on pcib0 pcib1: <PCI to PCI bridge (vendor=1106 device=b091)> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <ATI Mach64-GM graphics accelerator> at 0.0 atapci0: <Promise ATA100 controller> port 0xb000-0xb03f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xf8100000-0xf811ffff irq 2 at device 12.0 on pci0 ata2: at 0xa000 on atapci0 ata3: at 0xa800 on atapci0 fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xb400-0xb43f mem 0xf8000000-0xf80fffff,0xf8120000-0xf8120fff irq 5 at device 13.0 on pci0 fxp0: Ethernet address 00:30:48:41:53:be inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: <PCI to ISA bridge (vendor=1106 device=3074)> at device 17.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <VIA 8233 ATA100 controller> port 0xb800-0xb80f at device 17.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata1: at 0x170 irq 15 on atapci1 pci0: <unknown card> (vendor=0x1106, dev=0x3065) at 18.0 irq 10 orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xd3fff on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: <Generic ISA VGA> at port 0x3b0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 ppc0: parallel port not found. APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 DUMMYNET initialized (011031) IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, logging limited to 100 packets/entry by default BRIDGE 020214 loaded SMP: AP CPU #1 Launched! ad0: 117246MB <Maxtor 6Y120L0> [238216/16/63] at ata0-master UDMA100 ar0: 78533MB <ATA RAID1 array> [10011/255/63] status: READY subdisks: 0 READY ad4: 78533MB <IC35L080AVVA07-0> [159560/16/63] at ata2-master UDMA100 1 READY ad5: 78533MB <IC35L080AVVA07-0> [159560/16/63] at ata2-slave UDMA100 acd0: CD-RW <PLEXTOR CD-R PX-320A> at ata1-master PIO4 Mounting root from ufs:/dev/ar0s1a ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) falling back to PIO mode --end-- The last four lines here are suspicious, but it looks like the system handles it, and goes on working for a couple weeks. Does anyone know what this problem is? Thank you, Nathan >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030626194952.90047.qmail>