From owner-freebsd-geom@FreeBSD.ORG Sun Aug 19 07:54:24 2007 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B1AD16A417; Sun, 19 Aug 2007 07:54:24 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id BB1BF13C45D; Sun, 19 Aug 2007 07:54:23 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 2B3DE17382; Sun, 19 Aug 2007 07:26:17 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.1/8.14.1) with ESMTP id l7J7QGCB083157; Sun, 19 Aug 2007 07:26:16 GMT (envelope-from phk@critter.freebsd.dk) To: Graham From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sat, 18 Aug 2007 21:20:11 GMT." <200708182120.l7ILKBvF046099@freefall.freebsd.org> Date: Sun, 19 Aug 2007 07:26:16 +0000 Message-ID: <83156.1187508376@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: sos@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: kern/115572: ata disk bug (was: [gbde] gbde partitions fail at 28bit/48bit LBA addressing boundary ) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 07:54:24 -0000 In message <200708182120.l7ILKBvF046099@freefall.freebsd.org>, Graham writes: > 2. attempt (say).... > rabbit# dd if=/dev/zero of=/dev/ad4s1 oseek=2097151 count=1 bs=64k > and the result is.... > dd: /dev/ad4s1: Input/output error > 1+0 records in > 0+0 records out > 0 bytes transferred in 0.000325 secs (0 bytes/sec) > > (If dd is performed on the raw drive, /dev/ad4 then block boundary is > always a power of 2, and blocksize a smaller power of 2. That's always > ok. But we can't assume we use drives that way.) > > So a transfer which starts in the 28-bit zone, but extends over into > the 48-bit region, fails. Such transfers happen in the superblock of > certain size drives, and that plays havoc. The sector mapping of gbde > can do this, but soft-update gets screwed by this happening. It's not > actually to do with the crypto as I first suspected. This is a problem in the ata disk driver. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-geom@FreeBSD.ORG Sun Aug 19 09:28:27 2007 Return-Path: Delivered-To: freebsd-geom@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFC3416A41A for ; Sun, 19 Aug 2007 09:28:27 +0000 (UTC) (envelope-from sos@deepcore.dk) Received: from spider.deepcore.dk (cpe.atm2-0-70484.0x50a6c9a6.abnxx16.customer.tele.dk [80.166.201.166]) by mx1.freebsd.org (Postfix) with ESMTP id 878B913C481 for ; Sun, 19 Aug 2007 09:28:27 +0000 (UTC) (envelope-from sos@deepcore.dk) Received: from ws.deepcore.dk (ws.deepcore.dk [194.192.25.137]) by spider.deepcore.dk (8.13.8/8.13.8) with ESMTP id l7J9SQux089903; Sun, 19 Aug 2007 11:28:26 +0200 (CEST) (envelope-from sos@deepcore.dk) Message-ID: <46C80D3A.4020507@deepcore.dk> Date: Sun, 19 Aug 2007 11:28:26 +0200 From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Poul-Henning Kamp References: <83156.1187508376@critter.freebsd.dk> In-Reply-To: <83156.1187508376@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Cc: Graham , freebsd-geom@FreeBSD.ORG Subject: Re: kern/115572: ata disk bug X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 09:28:28 -0000 Poul-Henning Kamp wrote: > In message <200708182120.l7ILKBvF046099@freefall.freebsd.org>, Graham w= rites: > > =20 >> 2. attempt (say).... >> rabbit# dd if=3D/dev/zero of=3D/dev/ad4s1 oseek=3D2097151 count=3D1 bs= =3D64k >> and the result is.... >> dd: /dev/ad4s1: Input/output error >> 1+0 records in >> 0+0 records out >> 0 bytes transferred in 0.000325 secs (0 bytes/sec) >> >> (If dd is performed on the raw drive, /dev/ad4 then block boundary is >> always a power of 2, and blocksize a smaller power of 2. That's alway= s >> ok. But we can't assume we use drives that way.) >> >> So a transfer which starts in the 28-bit zone, but extends over into >> the 48-bit region, fails. Such transfers happen in the superblock of >> certain size drives, and that plays havoc. The sector mapping of gbde= >> can do this, but soft-update gets screwed by this happening. It's not= >> actually to do with the crypto as I first suspected. >> =20 > > This is a problem in the ata disk driver. > =20 Yeah, the crossover point from using 28bit to 48bit addressing is=20 flawed, below patch should fix that: --- ata-all.c 23 Feb 2007 16:25:08 -0000 1.279 +++ ata-all.c 19 Aug 2007 09:25:58 -0000 @@ -738,7 +738,7 @@ =20 atadev->flags &=3D ~ATA_D_48BIT_ACTIVE; =20 - if ((request->u.ata.lba >=3D ATA_MAX_28BIT_LBA || + if (((request->u.ata.lba + request->u.ata.count) >=3D=20 ATA_MAX_28BIT_LBA || request->u.ata.count > 256) && atadev->param.support.command2 & ATA_SUPPORT_ADDRESS48) { =20 -S=F8ren > > > =20 From owner-freebsd-geom@FreeBSD.ORG Sun Aug 19 17:11:05 2007 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C4B616A420; Sun, 19 Aug 2007 17:11:05 +0000 (UTC) (envelope-from nork@FreeBSD.org) Received: from sakura.ninth-nine.com (sakura.ninth-nine.com [219.127.74.120]) by mx1.freebsd.org (Postfix) with ESMTP id BCC7013C48A; Sun, 19 Aug 2007 17:11:04 +0000 (UTC) (envelope-from nork@FreeBSD.org) Received: from nadesico.ninth-nine.com (nadesico.ninth-nine.com [219.127.74.122]) by sakura.ninth-nine.com (8.14.1/8.14.1/NinthNine) with SMTP id l7JGsOF0024019; Mon, 20 Aug 2007 01:54:24 +0900 (JST) (envelope-from nork@FreeBSD.org) Date: Mon, 20 Aug 2007 01:54:24 +0900 From: Norikatsu Shigemura To: freebsd-geom@FreeBSD.org Message-Id: <20070820015424.42677df2.nork@FreeBSD.org> X-Mailer: Sylpheed 2.4.4 (GTK+ 2.10.14; i386-portbld-freebsd6.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (sakura.ninth-nine.com [219.127.74.121]); Mon, 20 Aug 2007 01:54:24 +0900 (JST) Cc: freebsd-proliant@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Anyone, would you try to make GEOM_DDF1(4)? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 17:11:05 -0000 Hi GEOM Developers. I'm using HP Proliant DL140G3 with Adaptec HostRAID (SATA). But I couldn't make FreeBSD 6.2-R install CD probe RAID1, so I researched DL140G3's RAID system(Software RAID). As the result, I understood that Adaptec HostRAID supports DDF format[*1]. [*1] http://www.snia.org/standards/DDFv1_00.pdf I was trying to make geom_ddf1(4), but I couldn't make it, it's too hard for me:-(. So, anyone, would you try to make geom_ddf1(4)? Of course, RAID1 support only OK! :D I have two real DDF1 data from configurated SATAs(/dev/ad4 and /dev/ad6 are RAID1 mirrored) like following way[*2][*3]. And I confirmed no configured SATAs filled all NUL(0x00) on DDF1 Anchor. [*2] http://people.freebsd.org/~nork/DDF1/ad4-last6k.img http://people.freebsd.org/~nork/DDF1/ad6-last6k.img *NOTE* Please restructure ad?.img from ad?-last6k.img I made a test program[*4]. I got a result of /dev/ad4. Please see also following[*5]. [*4] http://people.freebsd.org/~nork/DDF1/ddf1.h http://people.freebsd.org/~nork/DDF1/ddf1_test.c [*3] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In real system (RAID1 configured): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # geom disk list ad4 : Mediasize: 80026361856 (75G) : # echo 80026361856/512-65536 | bc 156235952 # dd if=/dev/ad4 bs=512 skip=156235952 | ssh ... 'cat > ad4-last32m.img' 65536+0 records in : # dd if=/dev/ad6 bs=512 skip=156235952 | ssh ... 'cat > ad6-last32m.img' 65536+0 records in : - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In my system: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # hexdump -C ad4-last32m.img | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 01ffe800 11 de 11 de 50 a0 0d fc 64 6f c1 3a 86 80 82 26 |..P...do:...&| ~~~~~~~~ : # echo `printf %d/512 0x01ffe800` | bc 65524 # dd if=ad4-last32m.img of=ad4-last6k bs=512 skip=65524 12+0 records in 12+0 records out 6144 bytes transferred in 0.000361 secs (17021006 bytes/sec) # dd if=ad6-last32m.img of=ad6-last6k bs=512 skip=65524 12+0 records in 12+0 records out 6144 bytes transferred in 0.000361 secs (17021006 bytes/sec) # echo 156235952+65524 | bc 156301476 # dd if=ad4-last6k.img bs=512 seek=156235952 of=ad4.img # dd if=ad6-last6k.img bs=512 seek=156235952 of=ad6.img - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [*5] ------------------------------------ sizeof struct ddf1_ddf_header = 512 ------------------------------------ Signature = 0xde11de11 CRC = 0x77802615 DDF_Header_GUID = 64 6f c1 3a 86 80 82 26 64 6f c1 3a b0 74 c1 3a 18 79 c1 3a ff ff ff ff DDF_rev = 30 32 2e 30 30 2e 30 30 Sequence_Number = -1 Timestamp = -1 Open_Flag = 0xff Foreign_Flag = 0xff Disk_Grouping = 255 Primary_Header_LBA = 0x000000000950f8a4 (156301476) Secondary_Header_LBA = 0xffffffffffffffff (18446744073709551615) Header_Type = 0x0 Workspace_Length = 32768 Workspace_LBA = 0x00000000095078a4 (156268708) Max_PD_Entries = 15 Max_VD_Entries = 4 Max_Partitions = 1 Configuration_Record_Length = 2 Max_Primary_Element_Entries = 65535 Controller_Data_Section = 1 Controller_Data_Length = 1 Physical_Disk_Records_Section = 2 Physical_Disk_Records_Length = 2 Virtual_Disk_Records_Section = 4 Virtual_Disk_Records_Length = 1 Configuration_Records_Section = 5 Configuration_Records_Length = 4 Physical_Disk_Data_Section = 9 Physical_Disk_Data_Length = 1 BBM_Log_Section = -1 BBM_Log_Length = 0 Diagnostic_Space = -1 Diagnostic_Space_Length = 0 Vendor_Specific_Logs_Section = 10 Vendor_Specific_Logs_Section_Length = 1 ------------------------------------ sizeof struct ddf1_controller_data = 512 ------------------------------------ Signature = 0xad111111 CRC = 0x8be2d4cc Controller_GUID = 64 6f c1 3a b0 74 c1 3a 18 79 c1 3a 7a 7d c1 3a 41 44 50 54 ff ff ff ff Controller_Type_Vendor_ID = 0x8086 Controller_Type_Device_ID = 0x2682 Controller_Type_Sub_Vendor_ID = 0x0000 Controller_Type_Sub_Device_ID = 0x0000 ------------------------------------ sizeof struct ddf1_phsical_disk_records = 64 ------------------------------------ Signature = 0x22222222 CRC = 0x8da47ea5 Populated_PDEs = 2 Max_PDE_Supported = 15 ------------------------------------ *sizeof struct ddf1_physical_disk_entries = 64 ------------------------------------ PD_GUID = 20 20 20 20 20 20 20 20 20 20 20 20 39 4c 52 33 4c c6 95 5a ff ff ff ff PD_Reference = 1839256256 PD_Type = 0x2 PD_State = 0x1 Configured_Size = 155987856 Path_Information = 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ------------------------------------ *sizeof struct ddf1_physical_disk_entries = 64 ------------------------------------ PD_GUID = 20 20 20 20 20 20 20 20 20 20 20 20 39 4c 52 33 cc c7 95 5a ff ff ff ff PD_Reference = 1839269174 PD_Type = 0x2 PD_State = 0x1 Configured_Size = 155987856 Path_Information = 01 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ------------------------------------ sizeof struct ddf1_virtual_disk_records = 64 ------------------------------------ Signature = 0xdddddddd CRC = 0xdae2a21a Populated_VDEs = 1 Max_VDE_Supported = 4 ------------------------------------ *sizeof struct ddf1_virtual_disk_entries = 64 ------------------------------------ VD_GUID = 40 35 30 5a 86 80 82 26 20 20 20 20 20 20 20 20 e6 58 a0 6d 3a 35 4a 45 VD_Number = 0 VD_Type = 0xffffffff VD_State = 0x0 Init_State = 0x2 VD_Name = DDF1-MIRROR ------------------------------------ sizeof struct ddf1_virtual_disk_configuration_record = 512 ------------------------------------ Signature = 0xeeeeeeee CRC = 0x82e01b58 VD_GUID = 40 35 30 5a 86 80 82 26 20 20 20 20 20 20 20 20 e6 58 a0 6d 3a 35 4a 45 Timestamp = 2 Sequence_Number = 2 Primary_Element_Count = 2 Strip_Size(Stripe_Size) = 7 Primary_RAID_Level = 1 RAID_Level_Qualifier = 0 Secondary_Element_Count = 1 Secondary_Element_Seq = 0 Secondary_RAID_Level = 255 Block_Count = 155987856 Size = 155987856 Cache_Policies_And_Parameters = 0x0000000000000000 BG_Rate = 16 ------------------------------------ sizeof struct ddf1_physical_disk_data = 512 ------------------------------------ Signature = 0x33333333 CRC = 0xe46196d5 PD_GUID = 20 20 20 20 20 20 20 20 20 20 20 20 39 4c 52 33 c6 49 61 5a ff ff ff ff PD_Reference = 1839256256 Forced_Ref_Flag = 0xff Forced_PD_GUID_Flag = 0x00 From owner-freebsd-geom@FreeBSD.ORG Sun Aug 19 20:10:06 2007 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABA9516A421 for ; Sun, 19 Aug 2007 20:10:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7B8EB13C45D for ; Sun, 19 Aug 2007 20:10:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l7JKA6P9029291 for ; Sun, 19 Aug 2007 20:10:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l7JKA686029288; Sun, 19 Aug 2007 20:10:06 GMT (envelope-from gnats) Date: Sun, 19 Aug 2007 20:10:06 GMT Message-Id: <200708192010.l7JKA686029288@freefall.freebsd.org> To: freebsd-geom@FreeBSD.org From: Graham Cc: Subject: Re: kern/115572: ata disk bug X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Graham List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 20:10:06 -0000 The following reply was made to PR kern/115572; it has been noted by GNATS. From: Graham To: "Søren" Schmidt Cc: bug-followup@FreeBSD.org, Poul-Henning Kamp Subject: Re: kern/115572: ata disk bug Date: Sun, 19 Aug 2007 13:03:21 -0700 (PDT) thanks Søren.... --- Søren Schmidt wrote: > Poul-Henning Kamp wrote: > > In message <200708182120.l7ILKBvF046099@freefall.freebsd.org>, > Graham writes: > > > > > >> 2. attempt (say).... > >> rabbit# dd if=/dev/zero of=/dev/ad4s1 oseek=2097151 count=1 bs=64k > >> and the result is.... > >> dd: /dev/ad4s1: Input/output error > >> 1+0 records in > >> 0+0 records out > >> 0 bytes transferred in 0.000325 secs (0 bytes/sec) > >> > >> (If dd is performed on the raw drive, /dev/ad4 then block boundary > is > >> always a power of 2, and blocksize a smaller power of 2. That's > always > >> ok. But we can't assume we use drives that way.) > >> > >> So a transfer which starts in the 28-bit zone, but extends over > into > >> the 48-bit region, fails. Such transfers happen in the superblock > of > >> certain size drives, and that plays havoc. The sector mapping of > gbde > >> can do this, but soft-update gets screwed by this happening. It's > not > >> actually to do with the crypto as I first suspected. > >> > > > > This is a problem in the ata disk driver. > > > Yeah, the crossover point from using 28bit to 48bit addressing is > flawed, below patch should fix that: > > --- ata-all.c 23 Feb 2007 16:25:08 -0000 1.279 > +++ ata-all.c 19 Aug 2007 09:25:58 -0000 > @@ -738,7 +738,7 @@ > > atadev->flags &= ~ATA_D_48BIT_ACTIVE; > > - if ((request->u.ata.lba >= ATA_MAX_28BIT_LBA || > + if (((request->u.ata.lba + request->u.ata.count) >= > ATA_MAX_28BIT_LBA || > request->u.ata.count > 256) && > atadev->param.support.command2 & ATA_SUPPORT_ADDRESS48) { > > > > -Søren > > > > > > > > > I have compiled with this patch and done an (admittedly superficial) test and the bug is not now visible. I am quite happy to see this one marked as solved. My apologies to P-H.K. for starting off on the wrong tack. Do I need to flag anyone else for this to get taken up in the stock kernel, or will this happen automatic now? regards Graham ____________________________________________________________________________________ Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 01:38:25 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADAEE16A500 for ; Mon, 20 Aug 2007 01:38:25 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id 47C2B13C461 for ; Mon, 20 Aug 2007 01:38:23 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 69253 invoked from network); 19 Aug 2007 21:38:22 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 19 Aug 2007 21:38:22 -0400 Message-ID: <46C8F08E.4090108@queue.to> Date: Sun, 19 Aug 2007 21:38:22 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 01:38:25 -0000 Due to a bug in the twe driver I've noticed quite a few accesses to the graid5 consumer that aren't on 512 byte boundaries. These are all 3 identical drives with identical geometries. The unaligned accesses occur when restoring (dump -0af /usr - | restore -xf - ) as well as when simply dding to the provider. Although the config below shows the consumers on the 3rd partition it's reproducable when the consumer is the entire drive (ad8 for ex, rather than in this case ad8s3, but you'll see the partition table is set up so the starting offset here is on a number of common boundaries). Any thoughts? I can make it happen at will by dd if=/dev/zero of=/graid5filesystem/tstfile bs=1m count=3000 at least a handful, probably thousands of times cally:~$ graid5 list Geom name: gr0 State: COMPLETE CALM Status: Total=3, Online=3 Type: AUTOMATIC Pending: (wqp 0 // 0) Stripesize: 131072 MemUse: 5341224 (msl 15) Newest: -1 ID: 1582456438 Providers: 1. Name: raid5/gr0 Mediasize: 620652658688 (578G) Sectorsize: 512 Mode: r2w2e3 Consumers: 1. Name: ad8s3 Mediasize: 310326460416 (289G) Sectorsize: 512 Mode: r3w3e4 DiskNo: 0 Error: No 2. Name: ad10s3 Mediasize: 310326460416 (289G) Sectorsize: 512 Mode: r3w3e4 DiskNo: 1 Error: No 3. Name: twed0s3 Mediasize: 310326460416 (289G) Sectorsize: 512 Mode: r3w3e4 DiskNo: 2 Error: No cally:~$ fdisk ad8 ******* Working on device /dev/ad8 ******* parameters extracted from in-core disklabel are: cylinders=620181 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=620181 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 514017 (250 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 509/ head 15/ sector 63 The data for partition 2 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 516096, size 18506880 (9036 Meg), flag 0 beg: cyl 512/ head 0/ sector 1; end: cyl 439/ head 15/ sector 63 The data for partition 3 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 19031040, size 606106368 (295950 Meg), flag 0 beg: cyl 448/ head 0/ sector 1; end: cyl 655/ head 15/ sector 63 The data for partition 4 is: cally$ bsdlabel raid5/gr0 # /dev/raid5/gr0: 8 partitions: # size offset fstype [fsize bsize bps/cpg] c: 1212212224 0 unused 0 0 # "raw" part, don't edit d: 73256400 0 4.2BSD 8192 65536 52392 f: 1138680832 73531392 4.2BSD 8192 65536 52392 cally:~$ cally:~$ uname -a FreeBSD cally.queue.to 6.2-STABLE FreeBSD 6.2-STABLE #0: Tue Aug 14 14:20:30 EDT 2007 hg@cally.queue.to:/usr/obj/usr/src/sys/CALLY i386 cally:~$ cd /usr/src/sys/geom/raid5 cally:/usr/src/sys/geom/raid5$ ls -ltra total 120 -rw-r--r-- 1 10001 10001 5476 Jul 31 02:31 g_raid5.h -rw-r--r-- 1 10001 10001 91616 Jul 31 02:31 g_raid5.c drwxr-xr-x 2 10001 10001 512 Jul 31 02:31 ./ drwxr-xr-x 16 10001 10001 1024 Aug 10 18:44 ../ cally:/usr/src/sys/geom/raid5$ From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 01:41:48 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F04DD16A41A for ; Mon, 20 Aug 2007 01:41:48 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id 990D113C45D for ; Mon, 20 Aug 2007 01:41:48 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 69429 invoked from network); 19 Aug 2007 21:41:47 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 19 Aug 2007 21:41:47 -0400 Message-ID: <46C8F15B.6020300@queue.to> Date: Sun, 19 Aug 2007 21:41:47 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: freebsd-geom@freebsd.org References: <46C8F08E.4090108@queue.to> In-Reply-To: <46C8F08E.4090108@queue.to> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 01:41:49 -0000 Howard Goldstein wrote: > Due to a bug in the twe driver I've noticed quite a few accesses to the > graid5 consumer that aren't on 512 byte boundaries. These are all 3 > identical drives with identical geometries. The unaligned accesses Identical geometries as far as fdisk thought, but not geom. Could this be the problem? twed0 is the device that has an issue with unaligned access cally:/usr/src/sys/geom/raid5$ geom disk list Geom name: ad5 Providers: 1. Name: ad5 Mediasize: 163928604672 (153G) Sectorsize: 512 Mode: r1w1e3 fwsectors: 63 fwheads: 16 Geom name: ad8 Providers: 1. Name: ad8 Mediasize: 320072933376 (298G) Sectorsize: 512 Mode: r6w6e9 fwsectors: 63 fwheads: 16 Geom name: ad10 Providers: 1. Name: ad10 Mediasize: 320072933376 (298G) Sectorsize: 512 Mode: r6w6e9 fwsectors: 63 fwheads: 16 Geom name: twed0 Providers: 1. Name: twed0 Mediasize: 320072933376 (298G) Sectorsize: 512 Mode: r4w4e6 fwsectors: 63 fwheads: 255 From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 05:51:57 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6B8F16A419 for ; Mon, 20 Aug 2007 05:51:57 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30306.mail.mud.yahoo.com (web30306.mail.mud.yahoo.com [209.191.69.68]) by mx1.freebsd.org (Postfix) with SMTP id 76FBC13C4A6 for ; Mon, 20 Aug 2007 05:51:57 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: (qmail 49365 invoked by uid 60001); 20 Aug 2007 05:51:56 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=r0Cyrhj/xBP6hYjlu+k0BRrIXrHNwTmPl9PYggwoy3uHGlyivd2tyHgvOiVpQG08GafRbEXahOheTBzU8aDY4W6cmzqt+zDehE4urbSlgCCxBJXCs77LMKt2aXRJO/apz3fU6KEfXely2mIbcmdo2UuSF7WXfvwDErzcceXiDXI=; X-YMail-OSG: srmaetQVM1mNl877HAMEgyQOKvW85kY5CfX0F9Vaj5CSa7ZiFY83OQI4lJ_NMbDvI1oExAQ6N4jHVU_YiDtYDVNzLIdlWNVYN6YwJs3k6DUyQ94ECK0SzPy3ig-- Received: from [84.141.125.172] by web30306.mail.mud.yahoo.com via HTTP; Sun, 19 Aug 2007 22:51:56 PDT Date: Sun, 19 Aug 2007 22:51:56 -0700 (PDT) From: Arne "Wörner" To: Howard Goldstein , freebsd-geom@freebsd.org In-Reply-To: <46C8F08E.4090108@queue.to> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <829676.49356.qm@web30306.mail.mud.yahoo.com> Cc: Subject: Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 05:51:57 -0000 --- Howard Goldstein wrote: > Due to a bug in the twe driver I've noticed quite a few accesses to the > graid5 consumer that aren't on 512 byte boundaries. These are all 3 > How can you tell, that the offsets/lengths are not integer multiples of the sector size (512 here most likely)? Maybe you mix it up with some block number (which is not necessarily a multiple of the sector size)? Can you give an example for such strange accesses? -Arne ____________________________________________________________________________________ Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. http://autos.yahoo.com/carfinder/ From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 06:25:04 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20E3916A417 for ; Mon, 20 Aug 2007 06:25:04 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id BD23713C458 for ; Mon, 20 Aug 2007 06:25:03 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 75057 invoked from network); 20 Aug 2007 02:25:02 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 20 Aug 2007 02:25:02 -0400 Message-ID: <46C933BE.9090904@queue.to> Date: Mon, 20 Aug 2007 02:25:02 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: arne_woerner@yahoo.com References: <829676.49356.qm@web30306.mail.mud.yahoo.com> In-Reply-To: <829676.49356.qm@web30306.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 06:25:04 -0000 Arne Wörner wrote: > --- Howard Goldstein wrote: >> Due to a bug in the twe driver I've noticed quite a few accesses to the >> graid5 consumer that aren't on 512 byte boundaries. These are all 3 >> > How can you tell, that the offsets/lengths are not integer multiples of the > sector size (512 here most likely)? The twe driver has a design flaw that depends on malloc()ing bounce buffers when it's handed data not aligned on 512 byte boundaries. When malloc fails, the driver syslogs a unique error that only can come from the part where it's acting on unaligned data. The question is why data are being sent to this driver unaligned. dd'ing to offset 0 at the beginning of three unpartitioned data disks for 3000x1MB doesn't present any opportunities I can think of for things to go out of alignment multiple times while writing the 3GB. It also happens during ordinary desktop operations (nothing extraordinary), and when restore xf -'ng a filesystem onto the raid array from a backup drive. > > Maybe you mix it up with some block number (which is not necessarily a multiple > of the sector size)? > > Can you give an example for such strange accesses? It was in the message you replied to, the dd invocation, that's the easiest to induce. I can make it happen at will. The indication is Aug 19 03:05:59 cally kernel: twe0: twe_map_request: malloc failed Aug 19 03:06:48 cally kernel: twe0: twe_map_request: malloc failed Aug 19 03:20:12 cally kernel: twe0: twe_map_request: malloc failed Inspection of the twe driver (twe_freebsd.c) will show that mesage only occurs when mallocing for the bounce buffer fails. From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 06:40:45 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8188116A41B for ; Mon, 20 Aug 2007 06:40:45 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: from web30315.mail.mud.yahoo.com (web30315.mail.mud.yahoo.com [209.191.69.77]) by mx1.freebsd.org (Postfix) with SMTP id 3065413C465 for ; Mon, 20 Aug 2007 06:40:45 +0000 (UTC) (envelope-from arne_woerner@yahoo.com) Received: (qmail 41252 invoked by uid 60001); 20 Aug 2007 06:40:44 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=dNq4BTOUlqlt1+nKA0RAqyqhbtOfMokZMmM7m9hyUPrve1oTFfzsiUn0H2DGJFJAiuGNZ+f0d+LevUpWo4zgBSVPn/UTMUpW0IXcfrxBPwjQ5wf9Z5ZsZg9p/rZCqMz+WyZj+tQ6Q4apHjhkewORscQKg/TmSp8IhMUW6dAyxw8=; X-YMail-OSG: 61LwmbgVM1nyb3N.cSSWfC4Z3A__8rJtFoVXAJXpqqo35Mx..eY2ig7hQd.qX5nvydvmgH3j3s75m5aSrivlRZHIIWke_N4NlaQmFQ0qvLmSj7Jw0C3fds43UA-- Received: from [84.141.125.172] by web30315.mail.mud.yahoo.com via HTTP; Sun, 19 Aug 2007 23:40:44 PDT Date: Sun, 19 Aug 2007 23:40:44 -0700 (PDT) From: Arne "Wörner" To: Howard Goldstein In-Reply-To: <46C933BE.9090904@queue.to> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <541513.41122.qm@web30315.mail.mud.yahoo.com> Cc: freebsd-geom@freebsd.org Subject: Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 06:40:45 -0000 --- Howard Goldstein wrote: > The twe driver has a design flaw that depends on malloc()ing bounce > buffers when it's handed data not aligned on 512 byte boundaries. When > malloc fails, the driver syslogs a unique error that only can come from > I had a look at that file (twe...c) and found that it is not 512 bytes but 64 bytes (in 6.2R) and that it is about the virtual memory address and not about the on-disk-offset... So it is not a GEOM problem... Maybe u could try to reduce the graid5 write cache by setting .maxwql and .maxmem to something smaller. -Arne ____________________________________________________________________________________Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/ From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 11:08:19 2007 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB84F16A475 for ; Mon, 20 Aug 2007 11:08:19 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 990A713C459 for ; Mon, 20 Aug 2007 11:08:19 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l7KB8JiC087410 for ; Mon, 20 Aug 2007 11:08:19 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l7KB8IQO087406 for freebsd-geom@FreeBSD.org; Mon, 20 Aug 2007 11:08:18 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 20 Aug 2007 11:08:18 GMT Message-Id: <200708201108.l7KB8IQO087406@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 11:08:19 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/73177 geom kldload geom_* causes panic due to memory exhaustion o kern/76538 geom [gbde] nfs-write on gbde partition stalls and continue o kern/83464 geom [geom] [patch] Unhandled malloc failures within libgeo o kern/84556 geom [geom] GBDE-encrypted swap causes panic at shutdown o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/89102 geom [geom_vfs] [panic] panic when forced unmount FS from u o bin/90093 geom fdisk(8) incapable of altering in-core geometry o kern/90582 geom [geom_mirror] [panic] Restore cause panic string (ffs_ o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/113419 geom [geom] geom fox multipathing not failing back o misc/113543 geom [geom] [patch] geom(8) utilities don't work inside the o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/115572 geom [gbde] gbde partitions fail at 28bit/48bit LBA address 14 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/78131 geom gbde "destroy" not working. o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/94632 geom [geom] Kernel output resets input while GELI asks for f kern/105390 geom [geli] filesystem on a md backed by sparse file with s o kern/107707 geom [geom] [patch] add new class geom_xbox360 to slice up p bin/110705 geom gmirror control utility does not exit with correct exi o kern/113790 geom [patch] enable the Camellia block cipher on GEOM ELI ( o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113885 geom [geom] [patch] improved gmirror balance algorithm o kern/114532 geom GEOM_MIRROR shows up in kldstat even if compiled in th o kern/115547 geom [geom] [patch] for GEOM Eli to get password from stdin 11 problems total. From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 13:20:07 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5830016A41A for ; Mon, 20 Aug 2007 13:20:07 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id F319713C45A for ; Mon, 20 Aug 2007 13:20:06 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 83138 invoked from network); 20 Aug 2007 09:20:06 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 20 Aug 2007 09:20:06 -0400 Message-ID: <46C99506.2040106@queue.to> Date: Mon, 20 Aug 2007 09:20:06 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: arne_woerner@yahoo.com References: <541513.41122.qm@web30315.mail.mud.yahoo.com> In-Reply-To: <541513.41122.qm@web30315.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 13:20:07 -0000 Arne Wörner wrote: > --- Howard Goldstein wrote: >> The twe driver has a design flaw that depends on malloc()ing bounce >> buffers when it's handed data not aligned on 512 byte boundaries. When >> malloc fails, the driver syslogs a unique error that only can come from >> > I had a look at that file (twe...c) and found that it is not 512 bytes but 64 > bytes (in 6.2R) and that it is about the virtual memory address and not about > the on-disk-offset... Ahh you're believing the code comment :( (Rule #1: Never believe the code comments). The code comment is wrong, the place where it claims 64 bytes it actually wants 512 bytes, you have to look at the #define and not the comment. The problem for me is much worse since it happens all the time with that requirement. Can the device driver ever expect to see 512 byte aligned buffers from geom? > > So it is not a GEOM problem... > > Maybe u could try to reduce the graid5 write cache by setting .maxwql and > .maxmem to something smaller. OK From owner-freebsd-geom@FreeBSD.ORG Mon Aug 20 14:03:50 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEEC416A469 for ; Mon, 20 Aug 2007 14:03:50 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id 64CC113C4A8 for ; Mon, 20 Aug 2007 14:03:50 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 84220 invoked from network); 20 Aug 2007 10:03:49 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 20 Aug 2007 10:03:49 -0400 Message-ID: <46C99F45.300@queue.to> Date: Mon, 20 Aug 2007 10:03:49 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: arne_woerner@yahoo.com References: <541513.41122.qm@web30315.mail.mud.yahoo.com> In-Reply-To: <541513.41122.qm@web30315.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-geom@freebsd.org Subject: Good workaround Re: graid5, 3 consumers, unaligned access X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2007 14:03:50 -0000 Arne Wörner wrote: > --- Howard Goldstein wrote: >> The twe driver has a design flaw that depends on malloc()ing bounce >> buffers when it's handed data not aligned on 512 byte boundaries. When >> malloc fails, the driver syslogs a unique error that only can come from >> > I had a look at that file (twe...c) and found that it is not 512 bytes but 64 > bytes (in 6.2R) and that it is about the virtual memory address and not about > the on-disk-offset... > > So it is not a GEOM problem... By the way I agree, as in the original message, I indicated this is a poorly designed driver. GEOM and graid5 seem to work flawlessly and efficiently on the other consumers. > > Maybe u could try to reduce the graid5 write cache by setting .maxwql and > .maxmem to something smaller. Reducing the write queue length does stop the complaining and that's definitely good. I don't see any performance hit looking at throughput with a naive dd-based throughput test and that's a bit of a relief. vm: Do you think twe(4) can be modified to avoid the malloc and bcopy and instead use the vm interface to remap it so it's on the 512 byte offset the stupid card wants to see? I'm not familiar with the pmap or vm facilities and whether they can do this or not. From owner-freebsd-geom@FreeBSD.ORG Tue Aug 21 12:32:44 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0183016A469; Tue, 21 Aug 2007 12:32:44 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av12-2-sn2.hy.skanova.net (av12-2-sn2.hy.skanova.net [81.228.8.186]) by mx1.freebsd.org (Postfix) with ESMTP id 7F3FE13C4B3; Tue, 21 Aug 2007 12:32:43 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av12-2-sn2.hy.skanova.net (Postfix, from userid 502) id 59158382DC; Tue, 21 Aug 2007 14:15:25 +0200 (CEST) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av12-2-sn2.hy.skanova.net (Postfix) with ESMTP id ACDAA382E0; Tue, 21 Aug 2007 14:15:24 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 8EB9037E4B; Tue, 21 Aug 2007 14:15:24 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 437EAB826; Tue, 21 Aug 2007 14:15:24 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zn2q1mt5l5gb; Tue, 21 Aug 2007 14:15:18 +0200 (CEST) Received: from [172.28.1.102] (jstrom-mb.stromnet.se [172.28.1.102]) by phomca.stromnet.se (Postfix) with ESMTP id 07AA0B824; Tue, 21 Aug 2007 14:15:18 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v752.3) Content-Transfer-Encoding: quoted-printable Message-Id: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed To: freebsd-geom@freebsd.org, freebsd-stable@freebsd.org From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Tue, 21 Aug 2007 14:15:08 +0200 X-Mailer: Apple Mail (2.752.3) Cc: Subject: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 12:32:44 -0000 Hi FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: =20 Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/=20 src/sys/ROUTER.POLLING i386 (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, =20 IPSEC, also pfsync and carp) This weekend I had a disk failing on me in a machine running gmirror =20 gm0 with 2 providers (ad0 and ad6). The whole box froze with no =20 screen output, and on hard reboot I got some LBA errors etc from ad0, =20= after a few reboots it got up and running though (I wasnt at the =20 screen, had do do it by phone so couldn't really debug very well). As soon as the box got up, I removed ad0 from the gmirror, so ad6 was =20= the only provider. Today I got a new disk that would replace ad0.. Now remeber, ad6 was the only disk in the mirror. I took the box down =20= fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4=20 +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. =20 Okay, there came the first problem; the boot loader gave me the usual =20= options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 =20 i got the same prompt again.. F5 nothing at all.. Funny!... The =20 system refused to load the loader (or whatever the 1-9 menu thingy is =20= called) kernel or anything.. So I finally plugged the old ad0 disk into the machine to at least =20 get it booted, thinking it would go up on the gmirror.. Nope..: (got the new ad4 out here) ad0: 38166MB at ata0-master UDMA100 ad6: 152627MB at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=3D4029378995). GEOM_MIRROR: Device gm0: provider ad6 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. Trying to mount root from ufs:/dev/mirror/gm0s1a Manual root filesystem specification: : Mount using filesystem eg. ufs:da0s1a ? List valid disk boot devices Abort manual input mountroot> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a =20 clean shutdown without problems.. It didnt even recognize any slices =20 on ad6s1 (altough the ad6s1 was found)... I entered ad0s1 as root and booted from there, ofcourse i got to =20 emergency shell since fstab looked for the gmirror devices, which =20 didnt exist.. Some more digging into gmirror, I did a gmirror dump ad6: Metadata on /dev/ad6: magic: GEOM::MIRROR version: 3 name: gm0 mid: 4029378995 did: 449032193 all: 3 genid: 0 syncid: 5 priority: 0 slice: 4096 balance: round-robin mediasize: 20416757248 sectorsize: 512 syncoffset: 0 mflags: NONE dflags: SYNCHRONIZING hcprovider: provsize: 160041885696 MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f Some googling indicated that SYNCHRONIZING means that its not =20 "complete" and wont mount? Is that correct? Why would it be in that =20 state then, I just shut it down fine... And where the f*ck did my =20 slices go??.. Did a sysctl kern.geom.mirror.debug=3D2 and tried to gmirror activate =20= the mirror: GEOM_MIRROR[1]: Creating device gm0 (id=3D4029378995). GEOM_MIRROR[0]: Device gm0 created (id=3D4029378995). GEOM_MIRROR[1]: root_mount_hold 0xc3539510 GEOM_MIRROR[1]: Adding disk ad6 to gm0. GEOM_MIRROR[2]: Adding disk ad6. GEOM_MIRROR[2]: Disk ad6 connected. GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0). GEOM_MIRROR[0]: Device gm0: provider ad6 detected. GEOM_MIRROR[2]: Tasting ad6s1. GEOM_MIRROR[0]: Force device gm0 start due to timeout. GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510 GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed. GEOM_MIRROR[2]: Metadata on ad6 updated. GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 =3D 0 GEOM_MIRROR[0]: Device gm0 destroyed. GEOM_MIRROR[1]: Thread exiting. GEOM_MIRROR[1]: Consumer ad6 destroyed. Soo.. What is going on here? Anyone with some clues? Currently =20 running on the ad0 disk, no raid at all.. Lets hope it doesnt die on =20 me (havent had any signs of that since sunday when it froze and gave =20 boot errors now so I'm hoping..). The data loss from using ad0 =20 instead of ad6 is probably minimal, its a router so its more or less =20 only logging that seems to been lost... For now I just want to get =20 clear about wth happened here and how to prevent it, and how to get =20 back up on a gmirror with ad6 and ad4 (to be plugged in) so I can =20 throw ad0 out... Thanks -- Johan Str=F6m Stromnet johan@stromnet.se http://www.stromnet.se/ From owner-freebsd-geom@FreeBSD.ORG Tue Aug 21 14:32:49 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03CBD16A468 for ; Tue, 21 Aug 2007 14:32:49 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id BFF7513C481 for ; Tue, 21 Aug 2007 14:32:47 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id E9D7048804; Tue, 21 Aug 2007 16:32:44 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 7D5F4487F4; Tue, 21 Aug 2007 16:32:34 +0200 (CEST) Date: Tue, 21 Aug 2007 16:31:36 +0200 From: Pawel Jakub Dawidek To: Johan =?iso-8859-1?Q?Str=F6m?= Message-ID: <20070821143136.GD1132@garage.freebsd.pl> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GpGaEY17fSl8rd50" Content-Disposition: inline In-Reply-To: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 14:32:49 -0000 --GpGaEY17fSl8rd50 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Str=F6m wrote: > Hi >=20 > FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: =20 > Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/=20 > src/sys/ROUTER.POLLING i386 >=20 > (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, =20 > IPSEC, also pfsync and carp) >=20 > This weekend I had a disk failing on me in a machine running gmirror =20 > gm0 with 2 providers (ad0 and ad6). The whole box froze with no =20 > screen output, and on hard reboot I got some LBA errors etc from ad0, =20 > after a few reboots it got up and running though (I wasnt at the =20 > screen, had do do it by phone so couldn't really debug very well). > As soon as the box got up, I removed ad0 from the gmirror, so ad6 was =20 > the only provider. Today I got a new disk that would replace ad0.. > Now remeber, ad6 was the only disk in the mirror. I took the box down =20 > fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4=20 > +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. =20 > Okay, there came the first problem; the boot loader gave me the usual =20 > options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 =20 > i got the same prompt again.. F5 nothing at all.. Funny!... The =20 > system refused to load the loader (or whatever the 1-9 menu thingy is =20 > called) kernel or anything.. > So I finally plugged the old ad0 disk into the machine to at least =20 > get it booted, thinking it would go up on the gmirror.. Nope..: >=20 > (got the new ad4 out here) > ad0: 38166MB at ata0-master UDMA100 > ad6: 152627MB at ata3-master SATA150 > GEOM_MIRROR: Device gm0 created (id=3D4029378995). > GEOM_MIRROR: Device gm0: provider ad6 detected. > Root mount waiting for: GMIRROR > Root mount waiting for: GMIRROR > Root mount waiting for: GMIRROR > Root mount waiting for: GMIRROR > GEOM_MIRROR: Force device gm0 start due to timeout. > Trying to mount root from ufs:/dev/mirror/gm0s1a >=20 > Manual root filesystem specification: > : Mount using filesystem > eg. ufs:da0s1a > ? List valid disk boot devices > Abort manual input >=20 > mountroot> >=20 > Okey... so why wouldnt it load my mirror from ad6 now?? I just did a =20 > clean shutdown without problems.. It didnt even recognize any slices =20 > on ad6s1 (altough the ad6s1 was found)... It loaded your mirror just fine, you confuse things. Gmirror started in degraded state, as one could expect, but it seems there is no 'a' partition on your gm0s1 slice (or entire bsdlabel is gone). You could try to recreate it based on bsdlabel from ad0 (if it should be the same), but I've no idea how it disapeared. Anyway, gmirror seems to work properly. > Some more digging into gmirror, I did a gmirror dump ad6: >=20 > Metadata on /dev/ad6: > magic: GEOM::MIRROR > version: 3 > name: gm0 > mid: 4029378995 > did: 449032193 > all: 3 You have 3-way mirror? > genid: 0 > syncid: 5 > priority: 0 > slice: 4096 > balance: round-robin > mediasize: 20416757248 > sectorsize: 512 > syncoffset: 0 > mflags: NONE > dflags: SYNCHRONIZING > hcprovider: > provsize: 160041885696 > MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f BTW. Your provider size is 149GB and mirror only use 19GB, which means you mirrored 149GB disk with 19GB disk and you waste 130GB (it's unused). > Some googling indicated that SYNCHRONIZING means that its not =20 > "complete" and wont mount? Is that correct? Why would it be in that =20 > state then, I just shut it down fine... And where the f*ck did my =20 > slices go??.. SYNCHRONIZING means that this component was/is being synchronized. It seems that you removed/lost the master disk, while it was synchronizing. It should work anyway. BTW. You confuse things again. Your slice is just fine (ad6s1), you don't have partitions, AFAIU. All in all, your partition table seems to be gone. If you created it on gmirror before (gm0s1) you may still have the same partition table on the other half of the mirror. You can try to move it to ad6 with bsdlabel and verify if you can see file system inside partitions. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --GpGaEY17fSl8rd50 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGyvdIForvXbEpPzQRAg8tAKCTRE6mtq95mSw7U9+v/rxBORcPFgCg87ij O6NQsf8IfYiTO3oDCzovUBU= =wWSe -----END PGP SIGNATURE----- --GpGaEY17fSl8rd50-- From owner-freebsd-geom@FreeBSD.ORG Tue Aug 21 15:06:59 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C66816A421 for ; Tue, 21 Aug 2007 15:06:59 +0000 (UTC) (envelope-from hg@queue.to) Received: from pickle.queue.to (pickle.queue.to [71.180.69.18]) by mx1.freebsd.org (Postfix) with ESMTP id CC00F13C46C for ; Tue, 21 Aug 2007 15:06:58 +0000 (UTC) (envelope-from hg@queue.to) Received: (qmail 81361 invoked from network); 21 Aug 2007 11:06:57 -0400 Received: from cally.queue.to (172.16.0.6) by pickle.queue.to with ESMTP; 21 Aug 2007 11:06:57 -0400 Message-ID: <46CAFF91.90209@queue.to> Date: Tue, 21 Aug 2007 11:06:57 -0400 From: Howard Goldstein User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> In-Reply-To: <20070821143136.GD1132@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 15:06:59 -0000 Pawel Jakub Dawidek wrote: > You have 3-way mirror? Are there issues with 3-way mirrors? (I've got one all at different consumer priorities) # gmirror list Geom name: gm0 State: COMPLETE Components: 3 Balance: load Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2713604671 Providers: 1. Name: mirror/gm0 Mediasize: 263176192 (251M) Sectorsize: 512 Mode: r1w1e2 From owner-freebsd-geom@FreeBSD.ORG Tue Aug 21 15:34:27 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C22C16A41B for ; Tue, 21 Aug 2007 15:34:27 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id EF80E13C47E for ; Tue, 21 Aug 2007 15:34:26 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 2D7C645B26; Tue, 21 Aug 2007 17:34:24 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 8D0CB45685; Tue, 21 Aug 2007 17:34:17 +0200 (CEST) Date: Tue, 21 Aug 2007 17:33:20 +0200 From: Pawel Jakub Dawidek To: Howard Goldstein Message-ID: <20070821153320.GE1132@garage.freebsd.pl> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> <46CAFF91.90209@queue.to> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VUDLurXRWRKrGuMn" Content-Disposition: inline In-Reply-To: <46CAFF91.90209@queue.to> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 15:34:27 -0000 --VUDLurXRWRKrGuMn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 21, 2007 at 11:06:57AM -0400, Howard Goldstein wrote: > Pawel Jakub Dawidek wrote: > > You have 3-way mirror? >=20 > Are there issues with 3-way mirrors? (I've got one all at different > consumer priorities) No, AFAIK. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --VUDLurXRWRKrGuMn Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGywXAForvXbEpPzQRAuC2AJwLuHHWEPZGyXPBEhfxwflwz7NuSACcCYPD ol8WJbCfJlObG13XTZ5PBGs= =gQw9 -----END PGP SIGNATURE----- --VUDLurXRWRKrGuMn-- From owner-freebsd-geom@FreeBSD.ORG Tue Aug 21 15:53:37 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3779C16A469; Tue, 21 Aug 2007 15:53:37 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av12-1-sn2.hy.skanova.net (av12-1-sn2.hy.skanova.net [81.228.8.185]) by mx1.freebsd.org (Postfix) with ESMTP id 99A6B13C481; Tue, 21 Aug 2007 15:53:36 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av12-1-sn2.hy.skanova.net (Postfix, from userid 502) id C8093381E0; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av12-1-sn2.hy.skanova.net (Postfix) with ESMTP id 83772380EF; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 6D70337E45; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 109ADB826; Tue, 21 Aug 2007 17:53:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5W9nHAK4Txym; Tue, 21 Aug 2007 17:53:27 +0200 (CEST) Received: from [172.28.1.102] (jstrom-mb.stromnet.se [172.28.1.102]) by phomca.stromnet.se (Postfix) with ESMTP id D4AF3B824; Tue, 21 Aug 2007 17:53:27 +0200 (CEST) In-Reply-To: <20070821143136.GD1132@garage.freebsd.pl> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Tue, 21 Aug 2007 17:53:19 +0200 To: Pawel Jakub Dawidek X-Mailer: Apple Mail (2.752.3) Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2007 15:53:37 -0000 On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: > On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Str=F6m wrote: >> Hi >> >> FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7: >> Tue Feb 13 18:24:34 CET 2007 johan@elfi.stromnet.se:/usr/obj/usr/ >> src/sys/ROUTER.POLLING i386 >> >> (ROUTER.POLLING is GENERIC + options DEVICE_POLLING and ALTQ, >> IPSEC, also pfsync and carp) >> >> This weekend I had a disk failing on me in a machine running gmirror >> gm0 with 2 providers (ad0 and ad6). The whole box froze with no >> screen output, and on hard reboot I got some LBA errors etc from ad0, >> after a few reboots it got up and running though (I wasnt at the >> screen, had do do it by phone so couldn't really debug very well). >> As soon as the box got up, I removed ad0 from the gmirror, so ad6 was >> the only provider. Today I got a new disk that would replace ad0.. >> Now remeber, ad6 was the only disk in the mirror. I took the box down >> fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 >> +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA.. >> Okay, there came the first problem; the boot loader gave me the usual >> options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1 >> i got the same prompt again.. F5 nothing at all.. Funny!... The >> system refused to load the loader (or whatever the 1-9 menu thingy is >> called) kernel or anything.. >> So I finally plugged the old ad0 disk into the machine to at least >> get it booted, thinking it would go up on the gmirror.. Nope..: >> >> (got the new ad4 out here) >> ad0: 38166MB at ata0-master UDMA100 >> ad6: 152627MB at ata3-master SATA150 >> GEOM_MIRROR: Device gm0 created (id=3D4029378995). >> GEOM_MIRROR: Device gm0: provider ad6 detected. >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> Root mount waiting for: GMIRROR >> GEOM_MIRROR: Force device gm0 start due to timeout. >> Trying to mount root from ufs:/dev/mirror/gm0s1a >> >> Manual root filesystem specification: >> : Mount using filesystem >> eg. ufs:da0s1a >> ? List valid disk boot devices >> Abort manual input >> >> mountroot> >> >> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a >> clean shutdown without problems.. It didnt even recognize any slices >> on ad6s1 (altough the ad6s1 was found)... > > It loaded your mirror just fine, you confuse things. Gmirror =20 > started in > degraded state, as one could expect, but it seems there is no 'a' > partition on your gm0s1 slice (or entire bsdlabel is gone). > You could try to recreate it based on bsdlabel from ad0 (if it =20 > should be > the same), but I've no idea how it disapeared. Anyway, gmirror =20 > seems to > work properly. Okay.. So it tries to load, find no partition table, and ignores and =20 unloads gm0? > >> Some more digging into gmirror, I did a gmirror dump ad6: >> >> Metadata on /dev/ad6: >> magic: GEOM::MIRROR >> version: 3 >> name: gm0 >> mid: 4029378995 >> did: 449032193 >> all: 3 > > You have 3-way mirror? Uhm.. never had more than 2 disks in this machine.. > >> genid: 0 >> syncid: 5 >> priority: 0 >> slice: 4096 >> balance: round-robin >> mediasize: 20416757248 >> sectorsize: 512 >> syncoffset: 0 >> mflags: NONE >> dflags: SYNCHRONIZING >> hcprovider: >> provsize: 160041885696 >> MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f > > BTW. Your provider size is 149GB and mirror only use 19GB, which means > you mirrored 149GB disk with 19GB disk and you waste 130GB (it's > unused). Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk =20= was in mirror (the rest was in another slice with its own label.. =20 altough if I'm doing fdisk on the disk it seems to not be there at =20 all..) But hum, 19??.. It should be 40 (or somewhere around there at =20 least).. =46rom ad0 mount: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1a 507630 85142 381878 18% / /dev/ad0s1e 507630 20 467000 0% /tmp /dev/ad0s1f 10154158 1176410 8165416 13% /usr /dev/ad0s1d 1506190 80326 1305370 6% /var /dev/ad0s1g 24174212 6939804 15300472 31% /var/squid swapinfo: /dev/ad0s1b 1022536 0 1022536 0% ~35Gb... Compared slice 1 on ad0 vs ad6, both have the exact same size. > >> Some googling indicated that SYNCHRONIZING means that its not >> "complete" and wont mount? Is that correct? Why would it be in that >> state then, I just shut it down fine... And where the f*ck did my >> slices go??.. > > SYNCHRONIZING means that this component was/is being synchronized. It > seems that you removed/lost the master disk, while it was =20 > synchronizing. > It should work anyway. Okay thats odd.. ad6 was the only disk in the mirror when I shut down =20= (shutdown -p now, and it powered off by itself..) so it should have =20 been good.. > > BTW. You confuse things again. Your slice is just fine (ad6s1), you > don't have partitions, AFAIU. Seems I did yes, thanks. Disks have slices (which on windows/dos/=20 linux world is called partitions) which have partitions.. check :) > > All in all, your partition table seems to be gone. If you created =20 > it on > gmirror before (gm0s1) you may still have the same partition table on > the other half of the mirror. You can try to move it to ad6 with > bsdlabel and verify if you can see file system inside partitions. Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. =20 Now I got same partition table on ad6s1 as on ad0s1... Trying to mount any though gives me incorrect super block... fsck =20 cannot find any superblocks either.. So.. What to do now then? Just for get ad6 and start from scratch =20 from ad2? (as i said, the data isnt very old realy)... Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do =20= that? fdisk both with full partition on both, create a new gmirror =20 between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use =20 dump | restore (of course with apps shutdown so no data is changed.. =20 or at least nothing that I care about) to copy all files from ad2 to =20 new mirror.. what do I need to do more? bsdlabel -B on both to write =20 boot blocks? Is there anything else to think about? Thanks for your help..:) From owner-freebsd-geom@FreeBSD.ORG Fri Aug 24 09:03:58 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 93E9116A418; Fri, 24 Aug 2007 09:03:58 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av12-2-sn2.hy.skanova.net (av12-2-sn2.hy.skanova.net [81.228.8.186]) by mx1.freebsd.org (Postfix) with ESMTP id 1C29613C481; Fri, 24 Aug 2007 09:03:58 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av12-2-sn2.hy.skanova.net (Postfix, from userid 502) id 4798E3854C; Fri, 24 Aug 2007 11:03:56 +0200 (CEST) Received: from smtp4-1-sn2.hy.skanova.net (smtp4-1-sn2.hy.skanova.net [81.228.8.92]) by av12-2-sn2.hy.skanova.net (Postfix) with ESMTP id 000C33854B; Fri, 24 Aug 2007 11:03:55 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-1-sn2.hy.skanova.net (Postfix) with ESMTP id 5334537E44; Fri, 24 Aug 2007 11:03:52 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id B121EB826; Fri, 24 Aug 2007 11:03:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YJoUwxuVR9F8; Fri, 24 Aug 2007 11:03:48 +0200 (CEST) Received: from [172.28.1.102] (jstrom-mb.stromnet.se [172.28.1.102]) by phomca.stromnet.se (Postfix) with ESMTP id 0F80AB824; Fri, 24 Aug 2007 11:03:48 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <77098FC1-E06C-4A0C-803F-038509F5F8CA@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Fri, 24 Aug 2007 11:03:31 +0200 To: freebsd-geom@freebsd.org, freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.752.3) Cc: Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Aug 2007 09:03:58 -0000 On Aug 21, 2007, at 17:53 , Johan Str=F6m wrote: > On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote: >> >> All in all, your partition table seems to be gone. If you created =20 >> it on >> gmirror before (gm0s1) you may still have the same partition table on >> the other half of the mirror. You can try to move it to ad6 with >> bsdlabel and verify if you can see file system inside partitions. > > Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1.. =20 > Now I got same partition table on ad6s1 as on ad0s1... > Trying to mount any though gives me incorrect super block... fsck =20 > cannot find any superblocks either.. > > So.. What to do now then? Just for get ad6 and start from scratch =20 > from ad2? (as i said, the data isnt very old realy)... > > Im thinking about doing complete reinstall on ad4+ad6 then.. Can I =20 > do that? fdisk both with full partition on both, create a new =20 > gmirror between ad6s1/ad4s1 (or should i go on ad4/ad6?), create =20 > slices, use dump | restore (of course with apps shutdown so no data =20= > is changed.. or at least nothing that I care about) to copy all =20 > files from ad2 to new mirror.. what do I need to do more? bsdlabel -=20= > B on both to write boot blocks? Is there anything else to think about? > Ok just for the record, I plugged both sata disks in, cleared them, =20 created a new mirror on both of them, sliced up and dump -0 -L -f - / =20= | restore -r -f - all filesystems, also bsdlabel -B. and what i =20 missed in the above thext, fdisk -B to write boot0 code.. Now its =20 booted fine on the mirror! altough, one thing that I got curious about. In the fdisk manpage it =20 says -b can be used to change the bootcode.. and that default is /=20 boot/mbr.. What is this? I checked md5 against boot0 and its not the =20 same (altough I guess it might just be some boot0 with different =20 config..). I never found any references to this mbr file in neither =20 man pages or handbook. Again, thanks for the help :) From owner-freebsd-geom@FreeBSD.ORG Fri Aug 24 10:21:52 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36CBC16A417 for ; Fri, 24 Aug 2007 10:21:52 +0000 (UTC) (envelope-from cyberleo@cyberleo.net) Received: from pizzabox.cyberleo.net (alpha.cyberleo.net [198.145.45.10]) by mx1.freebsd.org (Postfix) with ESMTP id F1A7413C480 for ; Fri, 24 Aug 2007 10:21:51 +0000 (UTC) (envelope-from cyberleo@cyberleo.net) Received: (qmail 99063 invoked from network); 24 Aug 2007 10:21:50 -0000 Received: from adsl-75-3-87-131.dsl.chcgil.sbcglobal.net (HELO ?172.16.44.14?) (cyberleo@cyberleo.net@75.3.87.131) by alpha.cyberleo.net with ESMTPA; 24 Aug 2007 10:21:50 -0000 Message-ID: <46CEB136.1090409@cyberleo.net> Date: Fri, 24 Aug 2007 05:21:42 -0500 From: CyberLeo Kitsana User-Agent: Thunderbird 2.0.0.6 (X11/20070819) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Johan_Str=F6m?= References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> <77098FC1-E06C-4A0C-803F-038509F5F8CA@stromnet.se> In-Reply-To: <77098FC1-E06C-4A0C-803F-038509F5F8CA@stromnet.se> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Aug 2007 10:21:52 -0000 Johan Ström wrote: > altough, one thing that I got curious about. In the fdisk manpage it > says -b can be used to change the bootcode.. and that default is > /boot/mbr.. What is this? I checked md5 against boot0 and its not the > same (altough I guess it might just be some boot0 with different > config..). I never found any references to this mbr file in neither man > pages or handbook. boot0 is the pretty 'F1 FreeBSD' type boot menu. mbr is more like your standard MS bootloader, that just boots the active slice of the current disk. The latter is my favorite, as I despise multi-booting. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net Furry Peace! - http://wwww.fur.com/peace/ From owner-freebsd-geom@FreeBSD.ORG Fri Aug 24 14:58:56 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D51116A419; Fri, 24 Aug 2007 14:58:56 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av6-1-sn3.vrr.skanova.net (av6-1-sn3.vrr.skanova.net [81.228.9.179]) by mx1.freebsd.org (Postfix) with ESMTP id 026EA13C46C; Fri, 24 Aug 2007 14:58:55 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av6-1-sn3.vrr.skanova.net (Postfix, from userid 502) id 6895938547; Fri, 24 Aug 2007 16:38:46 +0200 (CEST) Received: from smtp3-2-sn3.vrr.skanova.net (smtp3-2-sn3.vrr.skanova.net [81.228.9.102]) by av6-1-sn3.vrr.skanova.net (Postfix) with ESMTP id 322F238388; Fri, 24 Aug 2007 16:38:46 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp3-2-sn3.vrr.skanova.net (Postfix) with ESMTP id D400837E42; Fri, 24 Aug 2007 16:38:45 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 82BC8B826; Fri, 24 Aug 2007 16:38:45 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0nAUq8K6BapJ; Fri, 24 Aug 2007 16:38:42 +0200 (CEST) Received: from [172.28.1.102] (jstrom-mb.stromnet.se [172.28.1.102]) by phomca.stromnet.se (Postfix) with ESMTP id 94961B824; Fri, 24 Aug 2007 16:38:42 +0200 (CEST) In-Reply-To: <46CEB136.1090409@cyberleo.net> References: <8039436E-1824-4C2E-915B-9069DEF23B10@stromnet.se> <20070821143136.GD1132@garage.freebsd.pl> <441B87F4-5846-441B-B6B4-34694B483C73@stromnet.se> <77098FC1-E06C-4A0C-803F-038509F5F8CA@stromnet.se> <46CEB136.1090409@cyberleo.net> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <41D57A88-7C25-43C2-8426-1760EA976BCB@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Fri, 24 Aug 2007 16:38:25 +0200 To: CyberLeo Kitsana X-Mailer: Apple Mail (2.752.3) Cc: freebsd-stable@freebsd.org, freebsd-geom@freebsd.org Subject: Re: Crashed gmirror, single disk marked SYNC and wont boot... X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Aug 2007 14:58:56 -0000 On Aug 24, 2007, at 12:21 , CyberLeo Kitsana wrote: > Johan Str=F6m wrote: >> altough, one thing that I got curious about. In the fdisk manpage it >> says -b can be used to change the bootcode.. and that default is >> /boot/mbr.. What is this? I checked md5 against boot0 and its not the >> same (altough I guess it might just be some boot0 with different >> config..). I never found any references to this mbr file in =20 >> neither man >> pages or handbook. > > boot0 is the pretty 'F1 FreeBSD' type boot menu. mbr is more like your > standard MS bootloader, that just boots the active slice of the =20 > current > disk. > > The latter is my favorite, as I despise multi-booting. I see. Shouldn't this info be in the manpages/handbook somewhere? =20 Like referenced from boot0cfgs manpage or something, and in the boot =20 section in handbook.