From owner-freebsd-current@FreeBSD.ORG Mon Mar 15 10:31:08 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AFDB016A4CE for ; Mon, 15 Mar 2004 10:31:08 -0800 (PST) Received: from thunderbird.etv.net (thunderbird.etv.net [208.14.190.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 41D2743D53 for ; Mon, 15 Mar 2004 10:31:07 -0800 (PST) (envelope-from lists@efinley.com) Received: from [205.161.203.50] (helo=science1) by thunderbird.etv.net with smtp (Exim 4.30; FreeBSD) id 1B2wrg-0001lb-Nr for freebsd-current@freebsd.org; Mon, 15 Mar 2004 11:31:04 -0700 Message-ID: <019001c40abb$ac2a9350$32cba1cd@science1> From: "Elliot Finley" To: Date: Mon, 15 Mar 2004 11:31:03 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Subject: reliable disk FAILURE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Elliot Finley List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2004 18:31:08 -0000 when doing a disk-to-disk backup using dump/restore, I reliably get a disk failure. I don't think it's the disk because it happens on six different machines. All six machines are using SATA drives. They all have an ASUS P4P800 MB. None of the six machines had any problems until after the last security patch to 5.2.1. After the patch, they all fail. If I remember correctly, the last security patch only touched some TCP files, so the disk failures don't make any sense to me. commands causing the failure, console output and dmesg are below. This is on a test machine that I can take down or modify at any time, so if there is anything further that I can do to help debug this - please let me know. sequence of commands issued to cause failure ---------------------------------------------- Executing command: /bin/dd if=/dev/zero of=/dev/ad14 bs=1k count=1 Executing command: /sbin/fdisk -BI ad14 Executing command: /sbin/bsdlabel -w -B ad14s1 auto Executing command: /sbin/bsdlabel ad14s1 > /tmp/backup.disk.label Executing command: /bin/echo 'a: 2097152 0 4.2BSD' >> /tmp/backup.disk.label Executing command: /bin/echo 'b: 4194304 * swap' >> /tmp/backup.disk.label Executing command: /bin/echo 'd: 125829120 * 4.2BSD' >> /tmp/backup.disk.label Executing command: /bin/echo 'e: * * 4.2BSD' >> /tmp/backup.disk.label Executing command: /sbin/bsdlabel -R -B ad14s1 /tmp/backup.disk.label Executing command: /sbin/newfs -U /dev/ad14s1a Executing command: /sbin/newfs -U /dev/ad14s1d Executing command: /sbin/newfs -U /dev/ad14s1e Executing command: /sbin/mount -rw /dev/ad14s1a /mnt Executing command: /sbin/dump -0Lf - / | (cd /mnt; /sbin/restore -rf -) Executing command: /sbin/umount /mnt Executing command: /sbin/mount -rw /dev/ad14s1d /mnt Executing command: /sbin/dump -0Lf - /usr | (cd /mnt; /sbin/restore -rf -) DUMP: Date of this level 0 dump: Mon Mar 15 10:32:02 2004 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping snapshot of /dev/ad12s1d (/usr) to standard output DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 1621128 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] warning: ./.snap: File exists (dump/restore dies here - this time (doesn't die in same place every time) - causing the following output on the console) console output --------------- ad12: TIMEOUT - READ_DMA retrying (2 retries left) LBA=28166139 ad12: timeout sending command=c8 ad12: error issuing DMA command GEOM: create disk ad12 dp=0xc6ded160 ad12: 76319MB [155061/16/63] at ata6-master UDMA100 ad12: FAILURE - SETFEATURES SET TRANSFER MODE timed out dmesg ------ Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.2.1-RELEASE-p1 #5: Fri Mar 5 17:54:52 MST 2004 root@oregon.etv.net:/usr/obj/usr/src/sys/GENERIC Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a35000. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a3521c. ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2598.76-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 1072889856 (1023 MB) avail memory = 1032749056 (984 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard Pentium Pro MTRR support enabled npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard pcibios: BIOS version 2.10 Using $PIR table, 14 entries at 0xc00f5410 acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_cpu0: on acpi0 acpi_cpu1: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xf8000000-0xfbffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pcib1: could not get PCI interrupt routing table for \\_SB_.PCI0.P0P1 - AE_NOT_FOUND pci1: on pcib1 pci1: at device 0.0 (no driver attached) uhci0: port 0xef00-0xef1f irq 16 at device 29.0 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xef20-0xef3f irq 19 at device 29.1 on pci0 usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xef40-0xef5f irq 18 at device 29.2 on pci0 usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0xef80-0xef9f irq 16 at device 29.3 on pci0 usb3: on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: at device 29.7 (no driver attached) pcib2: at device 30.0 on pci0 pci2: on pcib2 skc0: <3Com 3C940 Gigabit Ethernet> port 0xd800-0xd8ff mem 0xfeafc000-0xfeafffff irq 22 at device 5.0 on pci2 skc0: 3Com Gigabit LOM (3C940) sk0: on skc0 sk0: Ethernet address: 00:0c:6e:54:4b:25 miibus0: on sk0 e1000phy0: on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto atapci0: port 0xdc00-0xdc7f,0xdfa0-0xdfaf,0xdf00-0xdf3f mem 0xfeac0000-0xfeadffff,0xfeafb000-0xfeafbfff irq 21 at device 9.0 on pci2 atapci0: [MPSAFE] ata2: at 0xfeafb000 on atapci0 ata2: [MPSAFE] ata3: at 0xfeafb000 on atapci0 ata3: [MPSAFE] ata4: at 0xfeafb000 on atapci0 ata4: [MPSAFE] ata5: at 0xfeafb000 on atapci0 ata5: [MPSAFE] fxp0: port 0xde80-0xdebf mem 0xfeaa0000-0xfeabffff,0xfeafa000-0xfeafafff irq 23 at device 11.0 on pci2 fxp0: Ethernet address 00:02:b3:d1:f7:ad miibus1: on fxp0 inphy0: on miibus1 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: at device 31.0 on pci0 isa0: on isab0 atapci1: port 0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci1 ata1: [MPSAFE] atapci2: port 0xef60-0xef6f,0xefa8-0xefab,0xefa0-0xefa7,0xefac-0xefaf,0xefe0-0xefe7 irq 18 at device 31.2 on pci0 atapci2: [MPSAFE] ata6: at 0xefe0 on atapci2 ata6: [MPSAFE] ata7: at 0xefa0 on atapci2 ata7: [MPSAFE] pci0: at device 31.3 (no driver attached) pci0: at device 31.5 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x64,0x60 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A sio1 port 0x2e8-0x2ef irq 3 on acpi0 sio1: type 16550A ppc0 port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 orm0: