Skip site navigation (1)Skip section navigation (2)
Date:      26 Jun 2003 19:49:52 -0000
From:      Nathan Gardner <nathan@inwa.net>
To:        FreeBSD-gnats-submit@FreeBSD.org
Cc:        support@inwa.net
Subject:   i386/54033: Disk lockup.
Message-ID:  <20030626194952.90047.qmail@eclipse.inwa.net>
Resent-Message-ID: <200307021720.h62HK00E069266@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         54033
>Category:       i386
>Synopsis:       Disk lockup.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jul 02 10:20:00 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Nathan Gardner
>Release:        FreeBSD 4.7-RELEASE i386
>Organization:
InWa.net
>Environment:
System: FreeBSD eclipse 4.7-RELEASE FreeBSD 4.7-RELEASE #1: Tue Jun 24 16:57:46 PDT 2003 nathan@host.inwa.net:/usr/src/sys/compile/eclipse.new i386

host# atacontrol list
ATA channel 0:
    Master:  ad0 <Maxtor 6Y120L0/YAR41VW0> ATA/ATAPI rev 7
    Slave:       no device present
ATA channel 1:
    Master: acd0 <PLEXTOR CD-R PX-320A/1.01> ATA/ATAPI rev 0
    Slave:       no device present
ATA channel 2:
    Master:  ad4 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5
    Slave:   ad5 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5
ATA channel 3:
    Master:      no device present
    Slave:       no device present
host#
>Description:
	After about two weeks processes accessing the single hard disk (/dev/ad0) hang. ps reports that the process status as D for disk write. The processes can not be killed, and do not finish. If the process is system critical this means that the system crashes, and if it is not, then the system needs to be rebooted before the processes will go away and the disk will become accessible again. When the system comes back up, the drive seems to work fine, and I have not found anything in my logs to show what the cause of the problem might be.
	I had been using the drive for swap and backup (via tar), but after several crashes, I moved swap off the drive. Now it just does backups, which do not cause the whole system to crash.
	The current drive is the second hard drive I have tried. The first one was also a Maxtor drive (although it was an 80GB drive whereas the new one is 120GB). I assumed that this error was hardware related, so I swapped it out, and brought it home for testing. The manufacturers tests show that there is nothing wrong with the drive. I have not, as yet, been able to try the system with a drive by a different manufacturer. 
	Because the processes hang in disk writes, I haven't been able to figure out what is causing the crash with any more certainty. Once one process hangs, any other processes that try to access the drive do the same. 

>How-To-Repeat:
	Use Maxtor drive regularly for a couple weeks for backups. Watch hang, reboot. 

Contents of /var/run/dmesg.boot
--begin--
Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.7-RELEASE #1: Tue Jun 24 16:57:46 PDT 2003
    nathan@host.inwa.net:/usr/src/sys/compile/eclipse.new
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (1399.33-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6b1  Stepping = 1
  Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 1073676288 (1048512K bytes)
avail memory = 1040371712 (1015988K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00178011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc0395000.
ccd0-3: Concatenated disk drivers
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 8 entries at 0xc00fdc60
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
IOAPIC #0 intpin 11 -> irq 2
pci0: <PCI bus> on pcib0
pcib1: <PCI to PCI bridge (vendor=1106 device=b091)> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <ATI Mach64-GM graphics accelerator> at 0.0
atapci0: <Promise ATA100 controller> port 0xb000-0xb03f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xf8100000-0xf811ffff irq 2 at device 12.0 on pci0
ata2: at 0xa000 on atapci0
ata3: at 0xa800 on atapci0
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xb400-0xb43f mem 0xf8000000-0xf80fffff,0xf8120000-0xf8120fff irq 5 at device 13.0 on pci0
fxp0: Ethernet address 00:30:48:41:53:be
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0: <PCI to ISA bridge (vendor=1106 device=3074)> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <VIA 8233 ATA100 controller> port 0xb800-0xb80f at device 17.1 on pci0
ata0: at 0x1f0 irq 14 on atapci1
ata1: at 0x170 irq 15 on atapci1
pci0: <unknown card> (vendor=0x1106, dev=0x3065) at 18.0 irq 10
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xd3fff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0: <Generic ISA VGA> at port 0x3b0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
ppc0: parallel port not found.
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
DUMMYNET initialized (011031)
IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, logging limited to 100 packets/entry by default
BRIDGE 020214 loaded
SMP: AP CPU #1 Launched!
ad0: 117246MB <Maxtor 6Y120L0> [238216/16/63] at ata0-master UDMA100
ar0: 78533MB <ATA RAID1 array> [10011/255/63] status: READY subdisks:
 0 READY ad4: 78533MB <IC35L080AVVA07-0> [159560/16/63] at ata2-master UDMA100
 1 READY ad5: 78533MB <IC35L080AVVA07-0> [159560/16/63] at ata2-slave UDMA100
acd0: CD-RW <PLEXTOR CD-R PX-320A> at ata1-master PIO4
Mounting root from ufs:/dev/ar0s1a
ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying
ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying
ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) retrying
ad0s1a: UDMA ICRC error reading fsbn 4205759 of 5696-5719 (ad0s1 bn 4205759; cn 261 tn 203 sn 5) falling back to PIO mode
--end--

The last four lines here are suspicious, but it looks like the system handles it, and goes on working for a couple weeks. 

Does anyone know what this problem is?

Thank you, Nathan
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030626194952.90047.qmail>