From owner-freebsd-fs  Fri Jan 31 16:11:43 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F0ABD37B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:11:41 -0800 (PST)
Received: from mail.netbsd.org (mail.netbsd.org [155.53.1.253])
	by mx1.FreeBSD.org (Postfix) with SMTP id 98A0443F79
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:11:41 -0800 (PST)
	(envelope-from wrstuden@netbsd.org)
Received: (qmail 15315 invoked by uid 1130); 1 Feb 2003 00:11:40 -0000
Date: Fri, 31 Jan 2003 16:11:29 -0800 (PST)
From: Bill Studenmund <wrstuden@netbsd.org>
X-X-Sender:  <wrstuden@vespasia.home-net.icnt.net>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: <freebsd-fs@freebsd.org>, <tech-kern@netbsd.org>
Subject: Re: DEV_B_SIZE 
In-Reply-To: <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.NEB.4.33.0301311545180.4728-100000@vespasia.home-net.icnt.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, 31 Jan 2003, Steve Byan wrote:

> I keep getting a response that reads like "we'll detect the larger
> block size and run with it".  I'm concerned that I'm not being clear
> that IDEMA is thinking of proposing a backward-compatibility mode with
> the presumption that it will work fine (albeit slowly) with existing
> binaries, i.e. code that hasn't been modified to be aware of the larger
> block size.
>
> If you think there are no functional problems with this
> backwards-compatibility scenario, including during recovery (fsck or
> journal roll-forward), I'd be happy to hear a clear "no problem".

I think Stephan Uphof hit on the main issues. I think there are functional
problems with this, but that it may be usefull in some situations. It just
needs a BIG warning.

Note I am assuming that if there's an error writing a 512-byte sector the
full 4k sector will have issues. If that is avoided (say only the 512-byte
area actually has an issue) then things are fine.

I think the main place that problems will arrise is that methods to reduce
error exposure won't necessarily work. Methods that try to resist single-
sector errors, say by making multiple copies of data, will need to know
that the single-sector error size (how much data goes away) is 4k, not
512 bytes. Exactly how may programs use these methods is not something I
know, so I can't tell you exactly what the exposure is.

The fact that the errors from a 4k re-write failing are not unheard of
isn't the issie. phk is right that that just looks like multiple sectors
dying. The problem is that we would have multiple-sector-death happening
with single-sector failure dynamics.

If you want this to not be an issue 100%, then just put a battery-backed
up cache on the device. Note I'm not saying back up the write cache, just
have a cache of the last area(s) being writen. We're talking maybe 8k of
cache plus checksumming plus the logical block addresses. Shouldn't be
hard (read should be cheep in mass quantities) to make a battery back up
something that small. Use a rechargable battery, and just say that if you
loose power while writing, you should restore power within say a month or
a few months to let said cache drain.

With well-tuned CMOS, you might even be able to get away with just static
charge or a capacitor for power storage.

Take care,

Bill


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message