From owner-freebsd-stable@FreeBSD.ORG Thu Jun 23 11:57:34 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4DB92106564A for ; Thu, 23 Jun 2011 11:57:34 +0000 (UTC) (envelope-from gkontos.mail@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 087E88FC18 for ; Thu, 23 Jun 2011 11:57:33 +0000 (UTC) Received: by iyb11 with SMTP id 11so2084411iyb.13 for ; Thu, 23 Jun 2011 04:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=fiH/kwJKzAqE4h6eBTNHHoNnXFPUnOLQ79qZ9xnxvrg=; b=i5m1xsw6g3ZFoPJM4uwgrnb+moPWQdao1iGaLIp0G9Q9ZL8setEYQhda7jdt9OHhpk KINdA1zL+ui0RvxrrvNCVQW//Dk6r3j7eIwXGrLGcE2awat7vjNyIVpJyti+MeDLC/mq 1HLVGTxIPQPZKA9MZrhSE7Bp8nrCHraKnsnaM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=eksGe4HDxNQp9W0GbIdqF0LJLV5zBrGVBmtRb2B3emsG2Km2590Q0MkcxBQuMczbQp aQDInWCsmnY7XH1194KL9HvPLHwfsXsuIAC9pPMeUoxNI9Z4G/76pWNc/RvzeUKLr17+ atPOrrqt+qGSRiBvv7CQm06k63kcKYxZ6qvmc= MIME-Version: 1.0 Received: by 10.43.61.196 with SMTP id wx4mr1739318icb.310.1308830252999; Thu, 23 Jun 2011 04:57:32 -0700 (PDT) Received: by 10.231.15.5 with HTTP; Thu, 23 Jun 2011 04:57:32 -0700 (PDT) In-Reply-To: <20110623005832.GA26118@icarus.home.lan> References: <20110623005832.GA26118@icarus.home.lan> Date: Thu, 23 Jun 2011 14:57:32 +0300 Message-ID: From: George Kontostanos To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: FreeBSD Stable Subject: Re: ata: SIGNATURE: ffffffff X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2011 11:57:34 -0000 Hi Jeremy, I am using the smartmon tools and in fact the first drive I replaced did show some errors. Next two of them were zeroed out and thoroughly tested using WD tools. No errors were reported either by smartmon nor by WD tools. I was also glad when the shop I bought them replaced them immediately, no questions asked. They said that they were having a lot of issues with WD drives lately. I will probably try to get a different brand controller especially after seeing the relevant PR Thanks, hp# smartctl -a /dev/ad4 smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Blue Serial ATA Device Model: WDC WD7500AALX-009BA0 Serial Number: WD-WCATR5711398 LU WWN Device Id: 5 0014ee 25ad8ccf5 Firmware Version: 15.01H15 User Capacity: 750,156,374,016 bytes [750 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jun 23 14:46:12 2011 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (13260) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 155) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3037) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 188 178 021 Pre-fail Always - 3558 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 21 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2 194 Temperature_Celsius 0x0022 110 107 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. On Thu, Jun 23, 2011 at 3:58 AM, Jeremy Chadwick wrote: > On Wed, Jun 22, 2011 at 05:52:39PM +0300, George Kontostanos wrote: > > This is the 3rd disk I replace in 3 disk- Raiz1 pool and I really start > to > > believe that the problem is somewhere else. The disks reside in a Promise > > PDC40718 SATA300 controller. I am running this set up since 8.0-Release > with > > no issues till a few months ago after 8.2-Release now at 8.2-Stable. > > Symptoms: > > > > Jun 22 17:08:53 hp kernel: ata2: timeout waiting to issue command > > Jun 22 17:08:53 hp kernel: ata2: error issuing SETFEATURES ENABLE WCACHE > > command > > Jun 22 17:09:33 hp kernel: ad4: WARNING - SET_MULTI taskqueue timeout - > > completing request directly > > Jun 22 17:09:33 hp kernel: ad4: WARNING - WRITE_DMA48 requeued due to > > channel reset LBA=321558741 > > Jun 22 17:09:34 hp kernel: ata2: SIGNATURE: 00000101 > > Jun 22 17:09:34 hp kernel: ad4: WARNING - WRITE_DMA48 requeued due to > > channel reset LBA=321558869 > > Jun 22 17:09:34 hp kernel: ata2: FAILURE - already active DMA on this > device > > Jun 22 17:09:34 hp kernel: ata2: setting up DMA failed > > Jun 22 17:09:34 hp kernel: ata2: FAILURE - already active DMA on this > device > > Jun 22 17:09:34 hp kernel: ata2: setting up DMA failed > > > > > > After a while the disk gets detached from the pool. Always the same disk. > > Rite now I am in the process of resilvering : > > > > pool: tank > > state: ONLINE > > status: One or more devices is currently being resilvered. The pool will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > > scan: resilver in progress since Wed Jun 22 17:09:40 2011 > > 189G scanned out of 578G at 88.8M/s, 1h14m to go > > 62.9G resilvered, 32.63% done > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > raidz1-0 ONLINE 0 0 0 > > label/zdisk1 ONLINE 0 0 0 > > label/zdisk2 ONLINE 0 0 0 > > label/zdisk3 ONLINE 0 0 0 (resilvering) > > > > But those errors have started to appear again. Again this is the 3rd disk > > replaced !!! Full dmesg attached > > > > -- > > George Kontostanos > > aisecure.net > > > Copyright (c) 1992-2011 The FreeBSD Project. > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > > The Regents of the University of California. All rights reserved. > > FreeBSD is a registered trademark of The FreeBSD Foundation. > > FreeBSD 8.2-STABLE #0: Mon Jun 6 19:00:19 EEST 2011 > > gkontos@hp.aicom.loc:/usr/obj/usr/src/sys/ML110G3 amd64 > > Timecounter "i8254" frequency 1193182 Hz quality 0 > > CPU: Intel(R) Pentium(R) D CPU 3.20GHz (3200.13-MHz K8-class CPU) > > Origin = "GenuineIntel" Id = 0xf64 Family = f Model = 6 Stepping = > 4 > > > Features=0xbfebfbff > > Features2=0xe4bd > > AMD Features=0x20100800 > > AMD Features2=0x1 > > TSC: P-state invariant > > real memory = 4294967296 (4096 MB) > > avail memory = 4106780672 (3916 MB) > > ACPI APIC Table: > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > FreeBSD/SMP: 1 package(s) x 2 core(s) > > cpu0 (BSP): APIC ID: 0 > > cpu1 (AP): APIC ID: 1 > > ioapic0: Changing APIC ID to 2 > > ioapic0 irqs 0-23 on motherboard > > kbd1 at kbdmux0 > > acpi0: on motherboard > > acpi0: [ITHREAD] > > acpi0: Power Button (fixed) > > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 > > cpu0: on acpi0 > > cpu1: on acpi0 > > pcib0: port 0xcf8-0xcff on acpi0 > > pci0: on pcib0 > > pcib1: irq 16 at device 28.0 on pci0 > > pci1: on pcib1 > > pcib2: irq 17 at device 28.5 on pci0 > > pci7: on pcib2 > > bge0: 0x004101> mem 0xfeaf0000-0xfeafffff irq 17 at device 0.0 on pci7 > > bge0: CHIP ID 0x00004101; ASIC REV 0x04; CHIP REV 0x41; PCI-E > > miibus0: on bge0 > > brgphy0: PHY 1 on miibus0 > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > bge0: Ethernet address: 00:13:21:cc:39:35 > > bge0: [ITHREAD] > > uhci0: port 0xdc00-0xdc1f irq > 23 at device 29.0 on pci0 > > uhci0: [ITHREAD] > > uhci0: LegSup = 0x2f00 > > usbus0: on uhci0 > > uhci1: port 0xd880-0xd89f irq > 19 at device 29.1 on pci0 > > uhci1: [ITHREAD] > > uhci1: LegSup = 0x2f00 > > usbus1: on uhci1 > > uhci2: port 0xd800-0xd81f irq > 18 at device 29.2 on pci0 > > uhci2: [ITHREAD] > > uhci2: LegSup = 0x2f00 > > usbus2: on uhci2 > > ehci0: mem > 0xfe9ffc00-0xfe9fffff irq 23 at device 29.7 on pci0 > > ehci0: [ITHREAD] > > usbus3: EHCI version 1.0 > > usbus3: on ehci0 > > pcib3: at device 30.0 on pci0 > > pci8: on pcib3 > > atapci0: port > 0xec00-0xec7f,0xe800-0xe8ff mem 0xfebff000-0xfebfffff,0xfebc0000-0xfebdffff > irq 16 at device 0.0 on pci8 > > atapci0: [ITHREAD] > > atapci0: [ITHREAD] > > ata2: on atapci0 > > ata2: SIGNATURE: 00000101 > > ata2: [ITHREAD] > > ata3: on atapci0 > > ata3: SIGNATURE: 00000101 > > ata3: [ITHREAD] > > ata4: on atapci0 > > ata4: [ITHREAD] > > ata5: on atapci0 > > ata5: SIGNATURE: 00000101 > > ata5: [ITHREAD] > > vgapci0: port 0xe000-0xe0ff mem > 0xe8000000-0xefffffff,0xfebb0000-0xfebbffff irq 16 at device 2.0 on pci8 > > isab0: at device 31.0 on pci0 > > isa0: on isab0 > > atapci1: port > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 > > ata0: on atapci1 > > ata0: [ITHREAD] > > ahci0: port > 0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0xcc0f mem > 0xfe9ff800-0xfe9ffbff irq 19 at device 31.2 on pci0 > > ahci0: [ITHREAD] > > ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier not supported > > ahcich0: at channel 0 on ahci0 > > ahcich0: [ITHREAD] > > ahcich1: at channel 1 on ahci0 > > ahcich1: [ITHREAD] > > ahcich2: at channel 2 on ahci0 > > ahcich2: [ITHREAD] > > ahcich3: at channel 3 on ahci0 > > ahcich3: [ITHREAD] > > acpi_button0: on acpi0 > > atrtc0: port 0x70-0x71 irq 8 on acpi0 > > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > > uart0: [FILTER] > > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > > uart1: [FILTER] > > ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 > > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode > > ppc0: FIFO with 16/16/8 bytes threshold > > ppc0: [ITHREAD] > > ppbus0: on ppc0 > > plip0: on ppbus0 > > plip0: [ITHREAD] > > lpt0: on ppbus0 > > lpt0: [ITHREAD] > > lpt0: Interrupt-driven port > > ppi0: on ppbus0 > > orm0: at iomem > 0xc0000-0xc8fff,0xc9000-0xcdfff,0xcf800-0xd47ff,0xd4800-0xd57ff on isa0 > > sc0: at flags 0x100 on isa0 > > sc0: VGA <16 virtual consoles, flags=0x300> > > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > > atkbdc0: at port 0x60,0x64 on isa0 > > atkbd0: irq 1 on atkbdc0 > > kbd0 at atkbd0 > > atkbd0: [GIANT-LOCKED] > > atkbd0: [ITHREAD] > > est0: on cpu0 > > est: CPU supports Enhanced Speedstep, but is not recognized. > > est: cpu_vendor GenuineIntel, msr 102400001024 > > device_attach: est0 attach returned 6 > > p4tcc0: on cpu0 > > est1: on cpu1 > > est: CPU supports Enhanced Speedstep, but is not recognized. > > est: cpu_vendor GenuineIntel, msr 102400001024 > > device_attach: est1 attach returned 6 > > p4tcc1: on cpu1 > > ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is > present; > > to enable, add "vfs.zfs.prefetch_disable=0" to > /boot/loader.conf. > > ZFS filesystem version 5 > > ZFS storage pool version 28 > > Timecounters tick every 1.000 msec > > usbus0: 12Mbps Full Speed USB v1.0 > > usbus1: 12Mbps Full Speed USB v1.0 > > usbus2: 12Mbps Full Speed USB v1.0 > > usbus3: 480Mbps High Speed USB v2.0 > > ugen0.1: at usbus0 > > uhub0: on usbus0 > > ugen1.1: at usbus1 > > uhub1: on usbus1 > > ugen2.1: at usbus2 > > uhub2: on usbus2 > > ugen3.1: at usbus3 > > uhub3: on usbus3 > > ad4: 715404MB at ata2-master UDMA100 > SATA 3Gb/s > > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > > ada0: ATA-7 SATA 1.x device > > ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > > ada0: Command Queueing enabled > > ada0: 238475MB (488397168 512 byte sectors: 16H 63S/T 16383C) > > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > > ada1: ATA-7 SATA 1.x device > > ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > > ada1: Command Queueing enabled > > ada1: 238475MB (488397168 512 byte sectors: 16H 63S/T 16383C)ad6: > 610480MB at ata3-master UDMA100 SATA 3Gb/s > > > > ad10: 610480MB at ata5-master UDMA100 > SATA 3Gb/s > > SMP: AP CPU #1 Launched! > > Root mount waiting for: usbus3 usbus2 usbus1 usbus0 > > uhub0: 2 ports with 2 removable, self powered > > uhub1: 2 ports with 2 removable, self powered > > uhub2: 2 ports with 2 removable, self powered > > Root mount waiting for: usbus3 > > uhub3: 6 ports with 6 removable, self powered > > Root mount waiting for: usbus3 > > ugen3.2: at usbus3 > > umass0: on > usbus3 > > umass0: SCSI over Bulk-Only; quirks = 0x0000 > > ugen0.2: at usbus0 > > umass0:4:0:-1: Attached to scbus4 > > Trying to mount root from zfs:zroot > > da0 at umass-sim0 bus 0 scbus4 target 0 lun 0 > > da0: Fixed Direct Access SCSI-4 device > > da0: 40.000MB/s transfers > > da0: 610480MB (1250263726 512 byte sectors: 255H 63S/T 77825C) > > bge0: link state changed to UP > > S > > log_sysevent: type 19 is not implemented > > ata2: SIGNATURE: ffffffff > > ata2: timeout waiting to issue command > > ata2: error issuing SETFEATURES SET TRANSFER MODE command > > ata2: timeout waiting to issue command > > ata2: error issuing SETFEATURES ENABLE RCACHE command > > ata2: timeout waiting to issue command > > ata2: error issuing SETFEATURES ENABLE WCACHE command > > ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly > > ad4: WARNING - WRITE_DMA48 requeued due to channel reset LBA=321558741 > > ata2: SIGNATURE: 00000101 > > ad4: WARNING - WRITE_DMA48 requeued due to channel reset LBA=321558869 > > ata2: FAILURE - already active DMA on this device > > ata2: setting up DMA failed > > ata2: FAILURE - already active DMA on this device > > ata2: setting up DMA failed > > George, > > Can you please install ports/sysutils/smartmontools (should be version > 5.41; if you have an older version please upgrade) and provide output > from the following comman > > smartctl -a /dev/ad4 > > With this I should be able to rule out weird disk problems. It's always > good to start there. > > For those unable to parse the above topology, the system has two SATA > controllers (the Promise uses ata(4), while the on-board ICH7 is in AHCI > mode and is using ahci.ko (AHCI-to-CAM)): > > atapci0 = Promise PDC40718 (Promise SATA300 TX4) > --> ata2-master = ad4 = WDC WD7500AALX-009BA0 > --> ata2-slave = > --> ata3-master = ad6 = WDC WD6401AALS-00J7B1 > --> ata3-slave = > --> ata4-master = > --> ata4-slave = > --> ata5-master = ad10 = WDC WD6401AALS-00J7B1 > --> ata5-slave = > > ahci0 = Intel ICH7 on-board in AHCI mode > --> ahcich0 = ada0 = ST3250410AS 3.AAA > --> ahcich1 = ada1 = ST3250410AS 3.AAA > --> ahcich2 = > --> ahcich3 = > > If you can't get this situation solved, I'd recommend spending $40 > (pocket change) to invest in a Silicon Image 3124 card. Your existing > Promise controller is a PCI card (not PCIe or PCI-X), and I don't know > if your motherboard has any PCIe or PCI-X slots, so I'm going to assume > the 133MByte/sec limitation is acceptable to you. As such, that limits > you to effectively this card: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16816132017 > > You do not have to use the RAID functionality of the card. FreeBSD > supports this card using siis(4) and it does utilise CAM, so your disks > would show up as adaX. The driver is actively supported/maintained. > > Avoid looking at cards which use the 3112, 3114, or 3512 chips. > > Hope this helps, or at least directs you in a path that lets you solve > the problem through a little bit of money. > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > -- George Kontostanos aisecure.net