From owner-freebsd-fs@FreeBSD.ORG Wed Jun 8 07:55:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52A41106566B for ; Wed, 8 Jun 2011 07:55:28 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [76.96.27.227]) by mx1.freebsd.org (Postfix) with ESMTP id 39BB48FC13 for ; Wed, 8 Jun 2011 07:55:27 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta12.emeryville.ca.mail.comcast.net with comcast id tKv41g0011u4NiLACKvS5C; Wed, 08 Jun 2011 07:55:26 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta21.emeryville.ca.mail.comcast.net with comcast id tKvE1g00B1t3BNj8hKvFah; Wed, 08 Jun 2011 07:55:15 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 779DD102C37; Wed, 8 Jun 2011 00:55:26 -0700 (PDT) Date: Wed, 8 Jun 2011 00:55:26 -0700 From: Jeremy Chadwick To: Robert Simmons Message-ID: <20110608075526.GA85577@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: GPT and disk alignment X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 07:55:28 -0000 On Wed, Jun 08, 2011 at 01:29:43AM -0400, Robert Simmons wrote: > On Wed, Jun 8, 2011 at 12:29 AM, Mark Felder wrote: > > On Tue, 07 Jun 2011 22:27:24 -0500, Robert Simmons > > wrote: > >> Do all HDDs that have 4KB per LBA present themselves to the OS as > >> having 512 bytes per LBA? > > > > No > > Ok, but can I assume that all HDDs of this type expand each of the 4K > sectors so that physically they take up the same space as eight 512 > byte LBAs? AFAIK, the new 4K LBA has a smaller ECC area than the sum > of 8 ECC areas in 512 byte LBAs, so if the data area was _not_ > expanded slightly, you would never really be aligned except every x > LBAs as the shifting approaches an LBA boundary, right? > > For any HDDs, do I need to worry about cylinder boundaries at all? > Has the reported "disk geometry" become divorced from the physical > reality in modern disks? If I do still need to worry about cylinder > boundaries, should I basically ignore every reported geometry (BIOS, > OS) and use what is written on the sticker on the drive? > > >> What about SSDs that have 1024 bytes per LBA? > > > > Not sure, but I do know that not all flash media have the same bytes per LBA > > internally. Some are 1K, some 4K, some even 8K. GPT is definitely the way to > > go if you want to make sure you're aligned. > > Ok, is there some way to tell gpart(8) what the LBA size is, or do I > have to calculate the offset of each partition manually? In Linux it > would be "fdisk -b 1024" for the example of SSDs or "fdisk -b 4096" > for 4K HDDs. I would think you'd just use "gpart -b" to specify the base offset. For example, on an Intel 320-series SSD (which uses a NAND flash cell size of 8192 bytes), "gpart -b 8" should end up at byte 65536 within the flash itself. I'm not sure if using 8 is correct though -- that is to say, I believe there is other space near the beginning of the drive which is used for things like the boot loader (I don't mean boot0, I mean boot2/loader and friends), or for the GPT loader or GPT ZFS loader. I could be wrong on this part -- need someone to correct me. All these different loaders and GPT support on FreeBSD seriously makes my head spin. Anyway back to SSDs: I have yet to see anyone list off all the *actual* NAND flash cell sizes of SSDs. For example, everyone said "4KBytes" for Intel SSDs, but come to find out it's actually 8KBytes. Don't confuse NAND flash cell size with NAND erase page size. They're two different things. Multiple cells make up (fit into) a single erase page. The alignment issue only applies to the cell part, not the erase page size. (I just had a discussion with an end-user on Intel's forum about this; someone had lead him to believe the erase page size was what he should align to). For example, on Intel 320-series drives, the NAND erase page size is 256 cells, thus 256*8192 = 2097152, or 2MBytes. Just a technical FYI bit for those curious. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |