From owner-freebsd-stable@FreeBSD.ORG Tue Jan 10 10:56:28 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3C5D106564A for ; Tue, 10 Jan 2012 10:56:27 +0000 (UTC) (envelope-from petefrench@ingresso.co.uk) Received: from constantine.ingresso.co.uk (constantine.ingresso.co.uk [IPv6:2a02:b90:3002:e550::3]) by mx1.freebsd.org (Postfix) with ESMTP id F131C8FC14 for ; Tue, 10 Jan 2012 10:56:26 +0000 (UTC) Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6] helo=dilbert.ingresso.co.uk) by constantine.ingresso.co.uk with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1RkZNR-0009yp-J5 for freebsd-stable@freebsd.org; Tue, 10 Jan 2012 10:56:25 +0000 Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.76 (FreeBSD)) (envelope-from ) id 1RkZNR-0002xZ-IN for freebsd-stable@freebsd.org; Tue, 10 Jan 2012 10:56:25 +0000 To: freebsd-stable@freebsd.org Message-Id: From: Pete French Date: Tue, 10 Jan 2012 10:56:25 +0000 Subject: Odd zpool problem - always one disc offline, maybe controller related ? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 10:56:28 -0000 I upgraded my system to -stable on January 6th, and since then I have noticed a very odd problem. I have a zpool with 4 drives in it, and one of them is always 'OFFLINE' - if I put it online and it styarts resolvering then another one immediately goes offline. It's the same two drives alternating as well - very perplexing. I have checked all the cabling (they are eSATA drives), and it is all pushed home solid. It looks from dmesg like the drive is disconnecting and reconnecting briefly, but thats triggering it being dropped out of the zpool. I must admit that though I noticed thos on the 6th, I cant tell you whhether it was working on the version I was runnign previously, as I dont check the zpool on that machine as ofetn as I shiuld. Am recompiling an earlier version now though to see. Details of what happens are below: -pete. ------ [pete@skerry ~]$ zpool status pool: cube state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 6.41G in 2h27m with 0 errors on Mon Jan 9 23:23:27 2012 config: NAME STATE READ WRITE CKSUM cube DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 ada1 ONLINE 0 0 0 8890308235385361660 REMOVED 0 0 0 was /dev/ada0 errors: No known data errors [pete@skerry ~]$ su Password: skerry# zpool online ada0 missing device name usage: online [-e] ... skerry# zpool online cube ada0 skerry# zpool status pool: cube state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jan 10 09:03:58 2012 1.02G scanned out of 1.42T at 80.6M/s, 5h8m to go 492M resilvered, 0.07% done config: NAME STATE READ WRITE CKSUM cube DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 ada2 ONLINE 0 0 0 6739201713000599902 REMOVED 0 0 0 was /dev/ada3 mirror-1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 (resilvering) errors: No known data errors skerry# ...and from dmesg at the point I did that: (ada3:siisch3:0:0:0): lost device (ada3:siisch3:0:0:0): removing device entry ada3 at siisch3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) here is the boot dmesg: Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-STABLE #0: Fri Jan 6 12:41:32 GMT 2012 pete@skerry.drayhouse:/usr/obj/usr/src/sys/GENERIC amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2992.52-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff Features2=0x8e3fd AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant real memory = 4299161600 (4100 MB) avail memory = 4024582144 (3838 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xf808-0xf80b on acpi0 cpu0: on acpi0 cpu1: on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 1.0 on pci0 pci1: on pcib1 pcib2: at device 0.0 on pci1 pci2: on pcib2 siis0: port 0x3100-0x310f mem 0xf0308000-0xf030807f,0xf0300000-0xf0307fff irq 16 at device 0.0 on pci2 siis0: [ITHREAD] siisch0: at channel 0 on siis0 siisch0: [ITHREAD] siisch1: at channel 1 on siis0 siisch1: [ITHREAD] siisch2: at channel 2 on siis0 siisch2: [ITHREAD] siisch3: at channel 3 on siis0 siisch3: [ITHREAD] vgapci0: port 0x4240-0x4247 mem 0xf0100000-0xf017ffff,0xe0000000-0xefffffff,0xf0000000-0xf00fffff irq 16 at device 2.0 on pci0 agp0: on vgapci0 agp0: aperture size is 256M, detected 6140k stolen memory pci0: at device 3.0 (no driver attached) em0: port 0x4100-0x411f mem 0xf0180000-0xf019ffff,0xf01a4000-0xf01a4fff irq 19 at device 25.0 on pci0 em0: Using an MSI interrupt em0: [FILTER] em0: Ethernet address: 00:1f:29:d3:51:be uhci0: port 0x4120-0x413f irq 20 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: on uhci0 uhci1: port 0x4140-0x415f irq 21 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: on uhci1 uhci2: port 0x4160-0x417f irq 22 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: on uhci2 ehci0: mem 0xf01a5000-0xf01a53ff irq 22 at device 26.7 on pci0 ehci0: [ITHREAD] usbus3: EHCI version 1.0 usbus3: on ehci0 pci0: at device 27.0 (no driver attached) pcib3: irq 20 at device 28.0 on pci0 pci32: on pcib3 siis1: port 0x1100-0x117f mem 0xf0404000-0xf040407f,0xf0400000-0xf0403fff irq 16 at device 0.0 on pci32 siis1: [ITHREAD] siisch4: at channel 0 on siis1 siisch4: [ITHREAD] siisch5: at channel 1 on siis1 siisch5: [ITHREAD] pcib4: irq 21 at device 28.1 on pci0 pci48: on pcib4 uhci3: port 0x4180-0x419f irq 20 at device 29.0 on pci0 uhci3: [ITHREAD] usbus4: on uhci3 uhci4: port 0x41a0-0x41bf irq 21 at device 29.1 on pci0 uhci4: [ITHREAD] usbus5: on uhci4 uhci5: port 0x41c0-0x41df irq 22 at device 29.2 on pci0 uhci5: [ITHREAD] usbus6: on uhci5 ehci1: mem 0xf01a5400-0xf01a57ff irq 20 at device 29.7 on pci0 ehci1: [ITHREAD] usbus7: EHCI version 1.0 usbus7: on ehci1 pcib5: at device 30.0 on pci0 pci7: on pcib5 em1: port 0x2100-0x213f mem 0xf0200000-0xf021ffff irq 20 at device 4.0 on pci7 em1: [FILTER] em1: Ethernet address: 00:07:e9:10:d8:86 em2: port 0x2140-0x217f mem 0xf0220000-0xf023ffff irq 21 at device 4.1 on pci7 em2: [FILTER] em2: Ethernet address: 00:07:e9:10:d8:87 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x4200-0x420f,0x4210-0x421f irq 18 at device 31.2 on pci0 ata0: at channel 0 on atapci0 ata0: [ITHREAD] ata1: at channel 1 on atapci0 ata1: [ITHREAD] atapci1: port 0x4258-0x425f,0x4270-0x4273,0x4260-0x4267,0x4274-0x4277,0x4220-0x422f,0x4230-0x423f irq 18 at device 31.5 on pci0 atapci1: [ITHREAD] ata2: at channel 0 on atapci1 ata2: [ITHREAD] ata3: at channel 1 on atapci1 ata3: [ITHREAD] acpi_button0: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] acpi_hpet1: iomem 0xfed00000-0xfed003ff on acpi0 device_attach: acpi_hpet1 attach returned 12 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range est0: on cpu0 p4tcc0: on cpu0 est1: on cpu1 p4tcc1: on cpu1 (noperiph:siisch0:0:-1:-1): rescan already queued (noperiph:siisch1:0:-1:-1): rescan already queued (noperiph:siisch2:0:-1:-1): rescan already queued (noperiph:siisch3:0:-1:-1): rescan already queued (noperiph:siisch4:0:-1:-1): rescan already queued ZFS filesystem version 5 ZFS storage pool version 28 Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0x168 offMax=0x40b usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ugen6.1: at usbus6 uhub6: on usbus6 ugen7.1: at usbus7 uhub7: on usbus7 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 2 ports with 2 removable, self powered acd0: DVDR at ata1-master UDMA100 SATA 1.5Gb/s uhub3: 6 ports with 6 removable, self powered uhub7: 6 ports with 6 removable, self powered ugen7.2: at usbus7 umass0: on usbus7 umass0: SCSI over Bulk-Only; quirks = 0x4000 umass0:7:0:-1: Attached to scbus7 acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sks=0x40 0x00 0x01 (probe1:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe1:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe1:umass-sim0:0:0:0): SCSI status: Check Condition (probe1:umass-sim0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) ugen1.2: at usbus1 ukbd0: on usbus1 kbd2 at ukbd0 uhid0: on usbus1 (probe0:umass-sim0:0:0:1): TEST UNIT READY. CDB: 0 20 0 0 0 0 (probe0:umass-sim0:0:0:1): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:1): SCSI status: Check Condition (probe0:umass-sim0:0:0:1): SCSI sense: NOT READY asc:3a,0 (Medium not present) (probe0:umass-sim0:0:0:2): TEST UNIT READY. CDB: 0 40 0 0 0 0 (probe0:umass-sim0:0:0:2): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:2): SCSI status: Check Condition (probe0:umass-sim0:0:0:2): SCSI sense: NOT READY asc:3a,0 (Medium not present) (probe0:umass-sim0:0:0:3): TEST UNIT READY. CDB: 0 60 0 0 0 0 (probe0:umass-sim0:0:0:3): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:3): SCSI status: Check Condition (probe0:umass-sim0:0:0:3): SCSI sense: NOT READY asc:3a,0 (Medium not present) ada0 at siisch0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada1 at siisch1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada2 at siisch2 bus 0 scbus2 target 0 lun 0 ada2: ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at siisch3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada4 at siisch4 bus 0 scbus4 target 0 lun 0 ada4: ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) ada4: Command Queueing enabled ada4: 30533MB (62533296 512 byte sectors: 16H 63S/T 16383C) da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 da0: Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present cd0 at ata1 bus 0 scbus6 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 100.000MB/s transfers cd0: cd present [3008 x 2048 byte records] da1 at umass-sim0 bus 0 scbus7 target 0 lun 1 da1: Removable Direct Access SCSI-0 device da1: 40.000MB/s transfers da1: Attempt to query device size failed: NOT READY, Medium not presentSMP: AP CPU #1 Launched! da2 at umass-sim0 bus 0 scbus7 target 0 lun 2 da2: Removable Direct Access SCSI-0 device da2: 40.000MB/s transfers da2: Attempt to query device size failed: NOT READY, Medium not present da3 at umass-sim0 bus 0 scbus7 target 0 lun 3 da3: Removable Direct Access SCSI-0 device da3: 40.000MB/s transfers da3: Attempt to query device size failed: NOT READY, Medium not present Trying to mount root from ufs:/dev/gpt/skerry-root Setting hostuuid: 0071dfa5-eaab-11df-88e2-02dc1053ff3a. Setting hostid: 0xe54799ad. Entropy harvesting: interrupts ethernet point_to_point kickstart . Starting file system checks: /dev/gpt/skerry-root: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/gpt/skerry-root: clean, 7792879 free (65439 frags, 965930 blocks, 0.6% fragmentation) Mounting local file systems: