From owner-freebsd-questions@FreeBSD.ORG Sat Feb 9 14:03:34 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEF6A16A417 for ; Sat, 9 Feb 2008 14:03:34 +0000 (UTC) (envelope-from dalroi@solfertje.student.utwente.nl) Received: from solfertje.student.utwente.nl (solfertje.student.utwente.nl [130.89.167.40]) by mx1.freebsd.org (Postfix) with ESMTP id 7032D13C468 for ; Sat, 9 Feb 2008 14:03:34 +0000 (UTC) (envelope-from dalroi@solfertje.student.utwente.nl) Received: from localhost (localhost.internal [127.0.0.1]) by solfertje.student.utwente.nl (Postfix) with SMTP id 616BB803F for ; Sat, 9 Feb 2008 14:06:45 +0100 (CET) Received: from [10.236.150.4] (hollewijn.internal [10.236.150.4]) by solfertje.student.utwente.nl (Postfix) with ESMTP id 32A8A80B6 for ; Sat, 9 Feb 2008 14:06:40 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v753) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: freebsd-questions@freebsd.org From: Alban Hertroys Date: Sat, 9 Feb 2008 14:31:47 +0100 X-Mailer: Apple Mail (2.753) X-DSPAM-Result: Innocent X-DSPAM-Processed: Sat Feb 9 14:06:45 2008 X-DSPAM-Confidence: 1.0000 X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 760,47ada565167321710067946 X-DSPAM-Factors: 27, Online=2+Type, 0.40000, could, 0.40000, but, 0.40000, but, 0.40000, 4718592, 0.40000, From*Alban, 0.40000, I+tried, 0.40000, I+tried, 0.40000, e, 0.40000, e, 0.40000, pretty+much, 0.40000, 1f+ed, 0.40000, testing)+#, 0.40000, error+means, 0.40000, Mime-Version*Message, 0.40000, Mediasize+1073733632, 0.40000, be+caused, 0.40000, 00+15, 0.40000, 00+15, 0.40000, active+or, 0.40000, having+been, 0.40000, /dev/ad0s1e+(NO, 0.40000, far+that, 0.40000, would+give, 0.40000, partition+/dev/, 0.40000, look+into, 0.40000, be+one, 0.40000 Subject: Bad sector on a gstripe X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Feb 2008 14:03:34 -0000 Hi all, I'm having trouble locating a bad sector on a gstriped file system. Smartd has been nagging about this single bad sector for months now, there don't appear to appear any new ones. It's about time I look into this... I got so far that I know the sector number in the partition involved. I detailed my attempts after the problem description. I tried newfs- ing the filesystem; it's my /tmp - there's nothing of relevance on it, but newfs-ing doesn't seem to have marked the sector bad. Anything wrong with: newfs -U -o time /dev/stripe/tmp ? I performed that from single-user mode after umounting all file-systems. I tried opening the filesystem with fsdb, but it can't open the partition, only the striped file-system - how do I determine which sector I'm dealing with on a striped fs? And how do I write to it to have it marked as a bad sector? I'm not sure whether this error means my disk is at the end of its life, smartd has been spamming me with this single error about the same sector for months now (every half hour!), and it's only the third error in the disks' smart log. If I understand the docs of smartmontools correctly, this could well be caused by the sector not having been written to all this time, which seems plausible to me; it's near the end of a mostly empty /tmp... From the lifetime it appears the disk is nearly two years old already, and it's been on pretty much 24/7. Maybe it is time to replace it (by a server version probably). Time for some data. The disk is an: Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3200822A Serial Number: 3LJ020SJ Firmware Version: 3.01 smartctl says: Error 3 occurred at disk power-on lifetime: 18356 hours (764 days + 20 hours) When the command that caused the error occurred, the device was active or idle . After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 30 ed 61 40 Error: UNC at LBA = 0x0061ed30 = 6417712 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 20 1f ed 61 40 00 15:42:14.650 READ DMA EXT 25 00 40 9f e6 61 40 00 15:42:14.419 READ DMA EXT 25 00 40 df f1 61 40 00 15:42:14.293 READ DMA EXT 25 00 40 5f e6 61 40 00 15:42:14.049 READ DMA EXT 25 00 40 5f e9 61 40 00 15:42:13.795 READ DMA EXT According to fdisk and bsdlabel that's on partition e of slice 1: # fdisk -s /dev/ad0 /dev/ad0: 387621 cyl 16 hd 63 sec Part Start Size Type Flags 1: 63 390716802 0xa5 0x80 So the bad sector is at 6417712 - 63 = 6417649 in /dev/ad0s1. # bsdlabel /dev/ad0s1 # /dev/ad0s1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 524288 0 4.2BSD 2048 16384 32776 b: 4194304 524288 swap c: 390716802 0 unused 0 0 # "raw" part, don't edit d: 1048576 4718592 4.2BSD 2048 16384 8 e: 1048576 5767168 4.2BSD 2048 16384 8 f: 20971520 6815744 4.2BSD 2048 16384 28552 g: 362929538 27787264 4.2BSD 2048 16384 28552 So the bad sector is 6417649 - 5767168 = 650481 in partition /dev/ ad0s1e at around 62% of its total size. This is where I started to get lost... I set up partition ad0s1e to be used in /dev/stripe/tmp: # gstripe list tmp Geom name: tmp State: UP Status: Total=2, Online=2 Type: AUTOMATIC Stripesize: 4096 ID: 1982480573 Providers: 1. Name: stripe/tmp Mediasize: 1073733632 (1.0G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: ad0s1e Mediasize: 536870912 (512M) Sectorsize: 512 Mode: r1w1e2 Number: 0 2. Name: ad1s1e Mediasize: 536870912 (512M) Sectorsize: 512 Mode: r1w1e2 Number: 1 I tried: (used -r to prevent it marking my FS's dirty while I was testing) # fsdb -r /dev/ad0s1e ** /dev/ad0s1e (NO WRITE) Cannot find file system superblock LOOK FOR ALTERNATE SUPERBLOCKS? no fsdb: cannot set up file system `/dev/ad0s1e' Exit 1 and: fsdb -r /dev/stripe/tmp ** /dev/stripe/tmp (NO WRITE) Examining file system `/dev/stripe/tmp' Last Mounted on /tmp current inode: directory I=2 MODE=40777 SIZE=512 BTIME=Feb 9 12:01:18 2008 [0 nsec] MTIME=Feb 9 12:54:41 2008 [0 nsec] CTIME=Feb 9 12:54:41 2008 [0 nsec] ATIME=Feb 9 13:23:07 2008 [0 nsec] OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=7a46458d fsdb (inum: 2)> I figured the findblk command would give me the inode of the problem area (although there won't be one if there are no files in that sector I think?), but I'm dealing with sectors striped across two disks... I have no idea which "block number" would be appropriate. The disk containing the bad sector is apparently the first in the stripe, that much I gathered. So, how to continue? Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:760,47ada565167321710067946!