From owner-freebsd-geom@FreeBSD.ORG Thu Dec 18 23:11:56 2014 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A25E7C96 for ; Thu, 18 Dec 2014 23:11:56 +0000 (UTC) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0093.outbound.protection.outlook.com [207.46.100.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 68A4D1970 for ; Thu, 18 Dec 2014 23:11:55 +0000 (UTC) Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by DM2PR0801MB0941.namprd08.prod.outlook.com (25.160.131.24) with Microsoft SMTP Server (TLS) id 15.1.31.17; Thu, 18 Dec 2014 23:11:47 +0000 Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with mapi id 15.01.0031.000; Thu, 18 Dec 2014 23:11:47 +0000 From: "Pokala, Ravi" To: "freebsd-geom@freebsd.org" Subject: Converting LBAs to byte offsets through the GEOM stack Thread-Topic: Converting LBAs to byte offsets through the GEOM stack Thread-Index: AQHQGxf+kE6X/QdjbU+vuIDpLz2I5Q== Date: Thu, 18 Dec 2014 23:11:46 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.7.141117 x-originating-ip: [64.80.217.3] authentication-results: spf=none (sender IP is ) smtp.mailfrom=rpokala@panasas.com; x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0941; x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:; SRVR:DM2PR0801MB0941; x-forefront-prvs: 042957ACD7 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(199003)(189002)(164054003)(120916001)(2656002)(105586002)(64706001)(50986999)(36756003)(54356999)(99286002)(2900100001)(102836002)(20776003)(21056001)(83506001)(68736005)(107886001)(99396003)(2351001)(107046002)(62966003)(575784001)(46102003)(101416001)(110136001)(40100003)(106116001)(229853001)(106356001)(66066001)(87936001)(97736003)(4396001)(77156002)(86362001)(450100001)(122556002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0941; H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: panasas.com does not designate permitted sender hosts) Content-Type: text/plain; charset="us-ascii" Content-ID: <17CA285FBD268942A93A9809695B1F9E@namprd08.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: panasas.com X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Dec 2014 23:11:56 -0000 Hi folks, When you issue a BIO, the requested byte offset (bio_offset) gets transformed by each layer of the GEOM stack as needed. If the bottom of the stack is a physical disk, g_disk_start() transforms the final offset to a device block address (bio_pblkno), which the disk device driver uses as the LBA. My question is this - is there a way to go in the other direction, from an LBA to a byte offset? For example, let's say I have a set of four drives which are configured as a RAID10: STRIPE: /dev/ada0p2 && /dev/ada1p2 =3D> /dev/stripe/gs0 STRIPE: /dev/ada2p2 && /dev/ada3p2 =3D> /dev/stripe/gs1 MIRROR: /dev/stripe/gs0 && /dev/stripe/gs1 =3D> /dev/stripe/gm0 I kick off a media scrub of the drive devices, to look for unreadable sectors. For the sake of saving bandwidth, I use the ATA_READ_VERIFY / ATA_READ_VERIFY48 commands (which read from the media, set the status and error bits, but don't transfer the data to the host). That requires talking directly to the drive, not the higher-level GEOMs, so I have to work in terms of LBAs. If I find an unreadable sector on one of the drives, I'd like to re-write the sector to heal it. I can do that by reading from the mirror; that will either pick the good side of the mirror in the first place, or will try and fail from the bad side, then failover to read from the good side. Either way, I end up with the proper data, and can re-write unreadable sector. The problem is, how do I calculate the byte offset in the mirror to read from? In the example above, since it's a relatively straightforward stack, I could do some math taking into account the LBA offsets for the GPT partitions, and the stripesize of the stripes, etc. That would work for this example, but it gets ugly fast if there are more complex transforms in the stack. It's easy enough to look at the partition table and say "LBA 12345 is in the range 1024 - 1048576, which is part of ada0p2". Going from there to "ada0p2 is part of gs0, which has a stripe interleave of 256KB" is more complicated. If there's something like GEOM_RAID3 in the mix - which has parity sectors which are not visible to the higher layers of the stack - then it gets uglier still. Is there a generic, supported way for doing this mapping? Or can someone point me in the right direction so, I can *create* a generic way for doing this, and submit it? :-) Thanks, Ravi