Date: Sun, 9 Jan 2011 16:41:43 +0100 From: Tom Vijlbrief <tom.vijlbrief@xs4all.nl> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org Subject: Re: Panic 8.2 PRERELEASE WRITE_DMA48 Message-ID: <AANLkTin3FHcsdMtA9OYaA2wrUx%2BfpyEsTThdRmS8sXA5@mail.gmail.com> In-Reply-To: <20110109122243.GA37530@icarus.home.lan> References: <AANLkTi=iaq1Lx521oUF2BSB4-2wi9Ys2fTLzz4kLaLVo@mail.gmail.com> <20110109122243.GA37530@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
I've run many fscks on /usr in single user because I had soft update inconsistencies, no DMA errors during those repairs. smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-PRERELEASE i386] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net =3D=3D=3D START OF INFORMATION SECTION =3D=3D=3D Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD103UJ Serial Number: S13PJ9BQC02902 Firmware Version: 1AA01113 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Sun Jan 9 16:40:24 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disab= led. Self-test execution status: ( 0) The previous self-test routine comp= leted without error or no self-test has e= ver been run. Total time to complete Offline data collection: (11811) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 198) minutes. Conveyance self-test routine recommended polling time: ( 21) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supporte= d. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 078 078 011 Pre-fail Always - 7580 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 399 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10097 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 2375 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 392 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 052 000 Old_age Always - 43 (Min/Max 42/45) 194 Temperature_Celsius 0x0022 056 050 000 Old_age Always - 44 (Min/Max 42/46) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 20728126 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2361 = - # 2 Short offline Completed without error 00% 2205 = - # 3 Short offline Completed without error 00% 2138 = - # 4 Extended offline Completed without error 00% 2109 = - # 5 Short offline Completed without error 00% 2105 = - # 6 Short offline Completed without error 00% 2092 = - # 7 Short offline Completed without error 00% 2083 = - # 8 Short offline Completed without error 00% 2057 = - # 9 Extended offline Completed without error 00% 2037 = - #10 Short offline Completed without error 00% 2033 = - #11 Short offline Completed without error 00% 2009 = - #12 Short offline Completed without error 00% 1974 = - #13 Short offline Completed without error 00% 1941 = - #14 Extended offline Completed without error 00% 1920 = - #15 Short offline Completed without error 00% 1916 = - #16 Short offline Completed without error 00% 1868 = - #17 Short offline Completed without error 00% 1810 = - #18 Short offline Completed without error 00% 1655 = - #19 Short offline Completed without error 00% 1638 = - #20 Extended offline Completed without error 00% 1596 = - #21 Short offline Completed without error 00% 1591 = - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. dmesg was in the attachment of the original mail but I'll paste it here: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-PRERELEASE #0: Thu Dec 30 12:21:06 CET 2010 root@swanbsd.v7f.eu:/usr/obj/usr/src/sys/GENERIC i386 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.53GHz (2558.54-MHz 686-class CPU) Origin =3D "GenuineIntel" Id =3D 0xf24 Family =3D f Model =3D 2 Stepp= ing =3D 4 Features=3D0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG= E,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM> real memory =3D 1610612736 (1536 MB) avail memory =3D 1555316736 (1483 MB) ACPI APIC Table: <ASUS P4B533 > ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <ASUS P4B533> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 5ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> mem 0xf2000000-0xf2ffffff,0xf4000000-0xf7ffffff,0xf3800000-0xf387ffff irq 16 at device 0.0 on pci1 nvidia0: <GeForce4 Ti 4200> on vgapci0 vgapci0: child nvidia0 requested pci_enable_busmaster vgapci0: child nvidia0 requested pci_enable_io nvidia0: [GIANT-LOCKED] nvidia0: [ITHREAD] uhci0: <Intel 82801DB (ICH4) USB controller USB-A> port 0xd800-0xd81f irq 16 at device 29.0 on pci0 uhci0: [ITHREAD] uhci0: LegSup =3D 0x2f00 usbus0: <Intel 82801DB (ICH4) USB controller USB-A> on uhci0 uhci1: <Intel 82801DB (ICH4) USB controller USB-B> port 0xd400-0xd41f irq 19 at device 29.1 on pci0 uhci1: [ITHREAD] uhci1: LegSup =3D 0x2f00 usbus1: <Intel 82801DB (ICH4) USB controller USB-B> on uhci1 uhci2: <Intel 82801DB (ICH4) USB controller USB-C> port 0xd000-0xd01f irq 18 at device 29.2 on pci0 uhci2: [ITHREAD] uhci2: LegSup =3D 0x2f00 usbus2: <Intel 82801DB (ICH4) USB controller USB-C> on uhci2 ehci0: <Intel 82801DB/L/M (ICH4) USB 2.0 controller> mem 0xf1800000-0xf18003ff irq 23 at device 29.7 on pci0 ehci0: [ITHREAD] usbus3: EHCI version 1.0 usbus3: <Intel 82801DB/L/M (ICH4) USB 2.0 controller> on ehci0 pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci2: <ACPI PCI bus> on pcib2 rl0: <RealTek 8139 10/100BaseTX> port 0xb800-0xb8ff mem 0xf1000000-0xf10000ff irq 22 at device 10.0 on pci2 miibus0: <MII bus> on rl0 rlphy0: <RealTek internal media interface> PHY 0 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:50:bf:e1:3b:35 rl0: [ITHREAD] atapci0: <SiI 3512 SATA150 controller> port 0xb400-0xb407,0xb000-0xb003,0xa800-0xa807,0xa400-0xa403,0xa000-0xa00f mem 0xf0800000-0xf08001ff irq 23 at device 11.0 on pci2 atapci0: [ITHREAD] ata2: <ATA channel 0> on atapci0 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci0 ata3: [ITHREAD] pcm0: <Creative EMU10K1> port 0x9800-0x981f irq 20 at device 12.0 on pci2 pcm0: <SigmaTel STAC9708/11 AC97 Codec> pcm0: [ITHREAD] isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel ICH4 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.1 on pci0 ata0: <ATA channel 0> on atapci1 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci1 ata1: [ITHREAD] atrtc0: <AT realtime clock> port 0x70-0x73 irq 8 on acpi0 fdc0: <floppy drive controller> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: <Parallel port> port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppc0: [ITHREAD] ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 plip0: [ITHREAD] lpt0: <Printer> on ppbus0 lpt0: [ITHREAD] lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] uart0: console (9600,n,8,1) uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 uart1: [FILTER] pmtimer0 on isa0 orm0: <ISA Option ROM> at iomem 0xd0000-0xd47ff pnpid ORM0000 on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x100> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] p4tcc0: <CPU Frequency Thermal Control> on cpu0 Timecounter "TSC" frequency 2558535720 Hz quality 800 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 ugen0.1: <Intel> at usbus0 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <Intel> at usbus2 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen3.1: <Intel> at usbus3 uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 ad0: 76319MB <WDC WD800BB-00CAA1 17.07W17> at ata0-master UDMA100 acd0: DVDROM <HL-DT-STDVD-ROM GDR8161B/0100> at ata0-slave UDMA33 ata1: DMA limited to UDMA33, controller found non-ATA66 cable acd1: DVDR <HL-DT-STDVD-RAM GH22NP20/1.02> at ata1-master UDMA33 ad4: 953869MB <SAMSUNG HD103UJ 1AA01113> at ata2-master UDMA100 SATA 1.5Gb/= s uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered Root mount waiting for: usbus3 Root mount waiting for: usbus3 uhub3: 6 ports with 6 removable, self powered 2011/1/9 Jeremy Chadwick <freebsd@jdc.parodius.com>: > On Sun, Jan 09, 2011 at 12:33:10PM +0100, Tom Vijlbrief wrote: >> The last half year I've been installing FreeBSD on several machines. >> >> I installed it on my main desktop system a few weeks ago which >> normally runs Linux, but I get this panic under heavy disk I/O. >> >> It even happened during the initial sysinstall, allthough I also have >> completed several buildworlds without problems. >> >> I can trigger it easily by accessing /usr (UFS) and a linux ext >> partition simultaneously, eg by copying >> large files to the /usr partition. >> >> Just bought a serial cable to enable the serial console of the various >> FreeBSD installations, which is of good use for this problem, because >> a crash dump is not written. >> >> Full boot output in the attachment >> >> Sun Jan =A09 10:11:17 CET 2011 >> unknown: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=3D274799820^M >> ata2: timeout waiting to issue command^M >> ata2: error issuing WRITE_DMA48 command^M >> g_vfs_done():ad4s2f[WRITE(offset=3D28915105792, length=3D131072)]error = =3D 6^M >> /usr: got error 6 while accessing filesystem^M >> panic: softdep_deallocate_dependencies: unrecovered I/O error^M >> cpuid =3D 0^M >> KDB: stack backtrace:^M >> #0 0xc08e0f77 at kdb_backtrace+0x47^M >> #1 0xc08b2037 at panic+0x117^M >> #2 0xc0ae2ecd at softdep_deallocate_dependencies+0x3d^M >> #3 0xc0925590 at brelse+0x90^M >> #4 0xc092829a at bufdone_finish+0x3fa^M >> #5 0xc092830d at bufdone+0x4d^M >> #6 0xc092bdf9 at cluster_callback+0x89^M >> #7 0xc09282f7 at bufdone+0x37^M >> #8 0xc0850ad5 at g_vfs_done+0x85^M >> #9 0xc09224d9 at biodone+0xb9^M >> #10 0xc084da69 at g_io_schedule_up+0x79^M >> #11 0xc084e0a8 at g_up_procbody+0x68^M >> #12 0xc0886fc1 at fork_exit+0x91^M >> #13 0xc0bcc144 at fork_trampoline+0x8^M >> Uptime: 2h56m27s^M >> Physical memory: 1515 MB^M >> Dumping 177 MB:ata2: timeout waiting to issue command^M >> ata2: error issuing WRITE_DMA command^M >> ^M >> ** DUMP FAILED (ERROR 5) **^M >> Automatic reboot in 15 seconds - press a key on the console to abort^M >> Rebooting...^M > > Can you please provide output from the following commands (after > installing ports/sysutils/smartmontools, which should be version 5.40 or > later (in case you haven't updated your ports tree)): > > $ dmesg > $ smartctl -a /dev/ad4 > > The SMART output should act as a verifier as to whether or not you > really do have a bad block on your disk (which is what READ/WRITE_DMA48 > can sometimes indicate). > > You may also want to boot the machine in single user mode and do a > manual "fsck /dev/ad4s2f". =A0It's been proven in the past that > background_fsck doesn't manage to address all issues. > > -- > | Jeremy Chadwick =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 jdc@parodius.com | > | Parodius Networking =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 http://= www.parodius.com/ | > | UNIX Systems Administrator =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Mountain = View, CA, USA | > | Making life hard for others since 1977. =A0 =A0 =A0 =A0 =A0 =A0 =A0 PGP= 4BD6C0CB | > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin3FHcsdMtA9OYaA2wrUx%2BfpyEsTThdRmS8sXA5>