From owner-freebsd-geom@FreeBSD.ORG  Wed Nov  5 19:21:50 2008
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BD5B41065676
	for <freebsd-geom@freebsd.org>; Wed,  5 Nov 2008 19:21:50 +0000 (UTC)
	(envelope-from jeff+freebsd@wagsky.com)
Received: from smtp.wagsky.com (wildside.wagsky.com [64.220.148.97])
	by mx1.freebsd.org (Postfix) with ESMTP id 836D78FC08
	for <freebsd-geom@freebsd.org>; Wed,  5 Nov 2008 19:21:50 +0000 (UTC)
	(envelope-from jeff+freebsd@wagsky.com)
Received: from port5.pn.wagsky.com (port5.pn.wagsky.com [192.168.6.5])
	by mailgw.pn.wagsky.com (Postfix) with ESMTP id 5978A20F;
	Wed,  5 Nov 2008 10:15:56 -0800 (PST)
Message-ID: <4911E2DB.3080405@wagsky.com>
Date: Wed, 05 Nov 2008 10:15:56 -0800
From: Jeff Kletsky <jeff+freebsd@wagsky.com>
User-Agent: Thunderbird 2.0.0.17 (Macintosh/20080914)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: g_vfs_done() read errors, apparently off end of drive
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Nov 2008 19:21:50 -0000

I'm puzzled by a series of geom read errors as the offset
(both before and after changing physical media) appears to be
past the end of the drive.

The machine in question was brought into service in early September
with my notes indicating:

# Used 160 GB 2.5″ Hitachi on Primary IDE.
# Build off FreeBSD 7.0 CD.
# Use 40 GB for partition for now.

* / — 512 MB
* swap — 2048 MB
* /var — 10 GB
* /tmp — 10 GB
* /usr — 17914 MB (left)

The machine is an old box that has been very reliable (and relatively
low power consumption, by today's standards) with a brand new Hitachi
drive. It runs my (jailed) webserver and mail relay as well as
ISC-dhcpd. The jail-specific file systems are under /var/db and the
read-only portions are in /usr/jails/basejail (ezjail default).

Starting a few weeks ago, I started getting apparent read errors
logged into /var/log/messages at 3 AM:

Oct 21 03:02:05 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=154543128576, length=16384)]error = 5
Oct 24 03:01:36 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5
Oct 25 03:01:38 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5
Oct 25 04:15:30 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=153192726528, length=16384)]error = 5
Oct 30 03:03:06 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=137393258496, length=16384)]error = 5
Nov 1 03:01:16 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=142595162112, length=16384)]error = 5
Nov 3 03:02:53 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=137199403008, length=16384)]error = 5
Nov 5 03:01:35 port16 kernel: 
g_vfs_done():ad0s1f[READ(offset=140475858944, length=16384)]error = 5

I took notice of them, and arranged for an RMA for the 160GB
drive. Yesterday, November 4th, I formatted an old "10G" drive and
used dump/restore to copy over the root, /var, and /usr
partitions. The machine came up nicely, but then threw another
3 A.M. read error.

I'm especially puzzled as 140475858944, if that is in bytes, would be
140,475,858,944 or ~140GB offset on a drive that has "10G" of
addressable storage.

Equally puzzling is that my notes indicate that the partition in use
before Nov 5th was only 40 G in size, again not a "possible" offset
for the error to appear.

Here's daily-run output from Oct 25th, confirming that there isn't
anything up there at the 140GB mark:

Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 496M 129M 327M 28% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/ad0s1e 9.7G 20K 8.9G 0% /tmp
/dev/ad0s1f 17G 3.2G 12G 21% /usr
/dev/ad0s1d 9.7G 1.1G 7.8G 12% /var

As well as last night's on the smaller drive:

Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 434M 129M 270M 32% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/ad0s1e 484M 14K 445M 0% /tmp
/dev/ad0s1f 4.3G 3.2G 790M 81% /usr
/dev/ad0s1d 2.9G 1.1G 1.6G 41% /var


fsck reports good on all partitions.

Any suggestions on how to track this down and resolve it?

TIA,

Jeff


Current dmesg.boot:
-------------------

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RELEASE-p5 #0: Wed Oct 1 10:10:12 UTC 2008
root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (733.13-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x686 Stepping = 6
Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory = 805224448 (767 MB)
avail memory = 774057984 (738 MB)
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Oct 1 2008 10:09:48)
acpi0: <ASUS CUV4X_EA> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 2ff00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 82C691 (Apollo Pro) host to PCI bridge> on hostb0
agp0: aperture size is 256M
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib0: no PRT entry for 0.1.INTA
vgapci0: <VGA-compatible display> port 0xd800-0xd8ff mem 
0xf0000000-0xf7ffffff,0xef000000-0xef07ffff irq 10 at device 0.0 on pci1
isab0: <PCI-ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686B UDMA100 controller> port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xb800-0xb80f at device 4.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <VIA 83C572 USB controller> port 0xb400-0xb41f irq 11 at device 
4.2 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xb000-0xb01f irq 11 at device 
4.3 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
pci0: <multimedia, audio> at device 5.0 (no driver attached)
em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.3> port 
0xa400-0xa43f mem 0xee800000-0xee81ffff,0xee000000-0xee01ffff irq 11 at 
device 10.0 on pci0
em0: Ethernet address: 00:1b:21:1d:f4:ed
em0: [FILTER]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xcbfff pnpid ORM0000 on isa0
fdc0: No FDOUT register!
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 733129479 Hz quality 800
Timecounters tick every 1.000 msec
hptrr: no controller detected.
ad0: 9773MB <FUJITSU MPF3102AT 0028> at ata0-master UDMA66
Trying to mount root from ufs:/dev/ad0s1a
em0: link state changed to UP

Hitachi drive line:
-------------------
ad0: 152627MB <Hitachi HTS541616J9AT00 SB4OA70H> at ata0-master UDMA100