Date: Fri, 31 Jan 2003 13:56:09 -0500 From: Steve Byan <stephen_byan@maxtor.com> To: Julian Elischer <julian@elischer.org> Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org Subject: Re: DEV_B_SIZE Message-ID: <A91AD932-354D-11D7-B26B-00306548867E@maxtor.com> In-Reply-To: <Pine.BSF.4.21.0301311002110.45015-100000@InterJet.elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, January 31, 2003, at 01:16 PM, Julian Elischer wrote: > > > On Fri, 31 Jan 2003, Steve Byan wrote: > >> There's a notion afoot in IDEMA to enlarge the underlying physical >> block size of disks to 4096 bytes while keeping a 512-byte logical >> block size for the interface. Unaligned accesses would involve either >> a >> read-modify-write or some proprietary mechanism that provides >> persistence without the latency cost of a read-modify-write. >> >> Performance issues aside, it occurs to me that hiding the underlying >> physical block size may break many careful-write and >> transaction-logging mechanisms, which may depend on no more than one >> block being corrupted during a failure. In IDEMA's proposal, a power >> failure during a write of a single 512-byte logical block could result >> in the corruption of the full 4K block, i.e. reads of any of the >> 512-byte logical blocks in that 4K physical block would return an >> uncorrectable ECC error. >> >> I'd appreciate hearing examples where hiding the underlying physical >> block size would break a file system, database, transaction processing >> monitor, or whatever. Please let me know if I may forward your reply >> to the committee. Thanks. > > I presume that if such a drive were made, thre would be some way to > identify it? Yes, but my concern is that advocates claim existing software could work (albeit slowly) with such a drive. It's hard to retroactively modify binaries installed in the field to adapt to a larger block size :-) > > It would be very easy to configure a filesystem to have a minimum > writable unit size of 4k, and I assume that doing so would be > slightly advantageous. (no Read/modify/write). it would however > be good if we could easily identify when doing so was a good idea. Yes, I've built and run OSF/1 on a system with 4K sector size; this was essentially BSD4.3. Modifying DEV_B_SIZE and recompiling the world was sufficient (well, actually the boot loader had to know the block size, and I needed a way to format the disks to 4K, and ...). > > Another idea would be to have some way that you could specify a block > number and have teh drive tell you the first in the same group.. That > would allow a filesystem to work out the alignment. It may not be able > to access absolute block numbers, if it's going through some layers of > translation, and some way of saying "am I alligned?" might be useful. > > One thing that does come to mind is that as you say, on power fail we > would now be liable to lose a group of 8 sectors (4k) instead of 1 x > 512 > byte sector. > > Recovery algorythms might have to deal with this (should we actually > decide to write one.. :-). > > Particularly if the block being written was the 1st, but the other 7 > blocks contain data that the OS has no way of knowing that they are in > jeopardy. In other words, I might know that block 1 is in danger and > put > it in a write log, (in a logging filesystem) but I have no way of > knowing that the other 7 are in danger, so they may not be in the write > log (assuming thAat the write log only holds the last N transactions.). > I'd say that this means that the drive should hold the active 4k block > in nvram or something.. > > You seem to have considered this but I'm in agreement that it could > prove "nasty" in exactly the cases that are most important.. > people use write logging etc. in cases where they care about the data > and recovery time. these are exactly the people who are going to be the > most pissed off to lose their data. .. Thanks, may I forward your response on to the committee? > > If we can easily telll the system to use 4k frags or 4k blocknumbers > (i.e. we can elect to expose the real blocksize) then we are probably > in better shape. I agree. Regards, -Steve -------- Steve Byan <stephen_byan@maxtor.com> Design Engineer Maxtor Corp. MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508) 770-3414 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A91AD932-354D-11D7-B26B-00306548867E>
