From owner-freebsd-geom@FreeBSD.ORG Wed Nov 5 19:21:50 2008 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD5B41065676 for ; Wed, 5 Nov 2008 19:21:50 +0000 (UTC) (envelope-from jeff+freebsd@wagsky.com) Received: from smtp.wagsky.com (wildside.wagsky.com [64.220.148.97]) by mx1.freebsd.org (Postfix) with ESMTP id 836D78FC08 for ; Wed, 5 Nov 2008 19:21:50 +0000 (UTC) (envelope-from jeff+freebsd@wagsky.com) Received: from port5.pn.wagsky.com (port5.pn.wagsky.com [192.168.6.5]) by mailgw.pn.wagsky.com (Postfix) with ESMTP id 5978A20F; Wed, 5 Nov 2008 10:15:56 -0800 (PST) Message-ID: <4911E2DB.3080405@wagsky.com> Date: Wed, 05 Nov 2008 10:15:56 -0800 From: Jeff Kletsky User-Agent: Thunderbird 2.0.0.17 (Macintosh/20080914) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: g_vfs_done() read errors, apparently off end of drive X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Nov 2008 19:21:50 -0000 I'm puzzled by a series of geom read errors as the offset (both before and after changing physical media) appears to be past the end of the drive. The machine in question was brought into service in early September with my notes indicating: # Used 160 GB 2.5″ Hitachi on Primary IDE. # Build off FreeBSD 7.0 CD. # Use 40 GB for partition for now. * / — 512 MB * swap — 2048 MB * /var — 10 GB * /tmp — 10 GB * /usr — 17914 MB (left) The machine is an old box that has been very reliable (and relatively low power consumption, by today's standards) with a brand new Hitachi drive. It runs my (jailed) webserver and mail relay as well as ISC-dhcpd. The jail-specific file systems are under /var/db and the read-only portions are in /usr/jails/basejail (ezjail default). Starting a few weeks ago, I started getting apparent read errors logged into /var/log/messages at 3 AM: Oct 21 03:02:05 port16 kernel: g_vfs_done():ad0s1f[READ(offset=154543128576, length=16384)]error = 5 Oct 24 03:01:36 port16 kernel: g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5 Oct 25 03:01:38 port16 kernel: g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5 Oct 25 04:15:30 port16 kernel: g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5 Oct 30 03:03:06 port16 kernel: g_vfs_done():ad0s1f[READ(offset=137393258496, length=16384)]error = 5 Nov 1 03:01:16 port16 kernel: g_vfs_done():ad0s1f[READ(offset=142595162112, length=16384)]error = 5 Nov 3 03:02:53 port16 kernel: g_vfs_done():ad0s1f[READ(offset=137199403008, length=16384)]error = 5 Nov 5 03:01:35 port16 kernel: g_vfs_done():ad0s1f[READ(offset=140475858944, length=16384)]error = 5 I took notice of them, and arranged for an RMA for the 160GB drive. Yesterday, November 4th, I formatted an old "10G" drive and used dump/restore to copy over the root, /var, and /usr partitions. The machine came up nicely, but then threw another 3 A.M. read error. I'm especially puzzled as 140475858944, if that is in bytes, would be 140,475,858,944 or ~140GB offset on a drive that has "10G" of addressable storage. Equally puzzling is that my notes indicate that the partition in use before Nov 5th was only 40 G in size, again not a "possible" offset for the error to appear. Here's daily-run output from Oct 25th, confirming that there isn't anything up there at the 140GB mark: Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1a 496M 129M 327M 28% / devfs 1.0K 1.0K 0B 100% /dev /dev/ad0s1e 9.7G 20K 8.9G 0% /tmp /dev/ad0s1f 17G 3.2G 12G 21% /usr /dev/ad0s1d 9.7G 1.1G 7.8G 12% /var As well as last night's on the smaller drive: Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1a 434M 129M 270M 32% / devfs 1.0K 1.0K 0B 100% /dev /dev/ad0s1e 484M 14K 445M 0% /tmp /dev/ad0s1f 4.3G 3.2G 790M 81% /usr /dev/ad0s1d 2.9G 1.1G 1.6G 41% /var fsck reports good on all partitions. Any suggestions on how to track this down and resolve it? TIA, Jeff Current dmesg.boot: ------------------- Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE-p5 #0: Wed Oct 1 10:10:12 UTC 2008 root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (733.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x383f9ff real memory = 805224448 (767 MB) avail memory = 774057984 (738 MB) kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Oct 1 2008 10:09:48) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 2ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: on hostb0 agp0: aperture size is 256M pcib1: at device 1.0 on pci0 pci1: on pcib1 pcib0: no PRT entry for 0.1.INTA vgapci0: port 0xd800-0xd8ff mem 0xf0000000-0xf7ffffff,0xef000000-0xef07ffff irq 10 at device 0.0 on pci1 isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xb800-0xb80f at device 4.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] uhci0: port 0xb400-0xb41f irq 11 at device 4.2 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xb000-0xb01f irq 11 at device 4.3 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered pci0: at device 5.0 (no driver attached) em0: port 0xa400-0xa43f mem 0xee800000-0xee81ffff,0xee000000-0xee01ffff irq 11 at device 10.0 on pci0 em0: Ethernet address: 00:1b:21:1d:f4:ed em0: [FILTER] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xcbfff pnpid ORM0000 on isa0 fdc0: No FDOUT register! ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 ppbus0: [ITHREAD] plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 733129479 Hz quality 800 Timecounters tick every 1.000 msec hptrr: no controller detected. ad0: 9773MB at ata0-master UDMA66 Trying to mount root from ufs:/dev/ad0s1a em0: link state changed to UP Hitachi drive line: ------------------- ad0: 152627MB at ata0-master UDMA100