From owner-freebsd-fs  Sun Mar 10  8:20:51 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from tara.freenix.org (keltia.freenix.org [62.4.20.87])
	by hub.freebsd.org (Postfix) with ESMTP id 4350737B404
	for <freebsd-fs@freebsd.org>; Sun, 10 Mar 2002 08:20:48 -0800 (PST)
Received: by tara.freenix.org (Postfix/TLS, from userid 101)
	id E72492AA3; Sun, 10 Mar 2002 17:20:46 +0100 (CET)
Date: Sun, 10 Mar 2002 17:20:46 +0100
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Subject: Re: [reiserfs-list] Re: Reiserfs on Freebsd
Message-ID: <20020310162046.GA8717@tara.freenix.org>
Mail-Followup-To: freebsd-fs@freebsd.org
References: <20020301224616.A12630@deathsgate.demon.co.uk> <Pine.NEB.3.96L.1020301213403.94041I-100000@fledge.watson.org> <20020302070305.A15982@deathsgate.demon.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20020302070305.A15982@deathsgate.demon.co.uk>
User-Agent: Mutt/1.3.26i
X-Operating-System: FreeBSD 5.0-CURRENT K6-3D/266 & 2x PIII/800 SMP
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

According to Bradley Kite:
> Ack, I probably am approaching the problem with a little naivety,
> but this is one way of learning, and your feedback is much appreciated!!

Another way to approach this is to talk to Kirk about the journalling FFS
Margo Seltzer wrote (Kirk submitted a paper about softupdates vs journalling
at BSDcon in 2000) and get the source code. 

-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
FreeBSD keltia.freenix.fr 5.0-CURRENT #80: Sun Jun  4 22:44:19 CEST 2000

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Mar 12 10:58:38 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from web13305.mail.yahoo.com (web13305.mail.yahoo.com [216.136.175.41])
	by hub.freebsd.org (Postfix) with SMTP id 09EA937B74F
	for <freebsd-fs@FreeBSD.org>; Tue, 12 Mar 2002 10:57:48 -0800 (PST)
Message-ID: <20020312185747.98993.qmail@web13305.mail.yahoo.com>
Received: from [132.248.28.30] by web13305.mail.yahoo.com via HTTP; Tue, 12 Mar 2002 10:57:47 PST
Date: Tue, 12 Mar 2002 10:57:47 -0800 (PST)
From: AQUAMAN <yoatl@yahoo.com>
Subject: filesystems compatibility
To: freebsd-fs@FreeBSD.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Hello

My question is the next:

I want to install at home debian, mandrake, redhat and
freebsd, and a partition /home. The four operating
systems can modify the last one, so that I don't have
to install a /home partition for each one of them.

I know that I have to install a filesystem that is
compatible with them.
Could you suggest me the appropriate one?

Hewi Yoatl

=====
Triathletes do it 3 times!!!

__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Mar 12 11:25:16 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from web21109.mail.yahoo.com (web21109.mail.yahoo.com [216.136.227.111])
	by hub.freebsd.org (Postfix) with SMTP id 6A74D37B417
	for <freebsd-fs@FreeBSD.org>; Tue, 12 Mar 2002 11:25:03 -0800 (PST)
Message-ID: <20020312192503.2810.qmail@web21109.mail.yahoo.com>
Received: from [62.254.0.5] by web21109.mail.yahoo.com via HTTP; Tue, 12 Mar 2002 11:25:03 PST
Date: Tue, 12 Mar 2002 11:25:03 -0800 (PST)
From: Hiten Pandya <hitmaster2k@yahoo.com>
Reply-To: hiten@uk.FreeBSD.org
Subject: Re: filesystems compatibility
To: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.org
In-Reply-To: <20020312185747.98993.qmail@web13305.mail.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

--- AQUAMAN <yoatl@yahoo.com> wrote:
> Hello
> 
> My question is the next:
> 
> I want to install at home debian, mandrake, redhat and
> freebsd, and a partition /home. The four operating
> systems can modify the last one, so that I don't have
> to install a /home partition for each one of them.
> 
> I know that I have to install a filesystem that is
> compatible with them.
> Could you suggest me the appropriate one?
> 
> Hewi Yoatl

Hello Hewi,

This list is only for technical discussions, I would suggest that you ask
this at freebsd-questions@FreeBSD.org, which will yeild you better
responses.

Sorry, I can't answer your question, but a rough guess would be to use EXT2FS
for your home partition.

Regards,

  -- Hiten Pandya
  -- <hiten@uk.FreeBSD.org>

__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Mar 12 13:28:15 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12])
	by hub.freebsd.org (Postfix) with ESMTP id 9155737B419
	for <freebsd-fs@freebsd.org>; Tue, 12 Mar 2002 13:27:17 -0800 (PST)
Received: from pool0291.cvx40-bradley.dialup.earthlink.net ([216.244.43.36] helo=mindspring.com)
	by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16ktng-0001b0-00; Tue, 12 Mar 2002 13:27:16 -0800
Message-ID: <3C8E72A3.6E9CBC6F@mindspring.com>
Date: Tue, 12 Mar 2002 13:26:59 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: AQUAMAN <yoatl@yahoo.com>
Cc: freebsd-fs@FreeBSD.org
Subject: Re: filesystems compatibility
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

AQUAMAN wrote:
> I want to install at home debian, mandrake, redhat and
> freebsd, and a partition /home. The four operating
> systems can modify the last one, so that I don't have
> to install a /home partition for each one of them.
> 
> I know that I have to install a filesystem that is
> compatible with them.
> Could you suggest me the appropriate one?

You probably wanted to ask this in questions.

--

It's really hard to answer these kinds of questions exhaustively,
since Linux has the bad habit of changing things about the on
disk layout of FS data, and not changing the name of the FS;
there are at least six incompatible hacks on EXT2FS since the
first EXT2FS, and knowing which one you have is an exercise in
detective work.

The limiting factor is going to be the FS's the are read/write
that all the Linux distributions have in common, and that are
also supported by FreeBSD.

I think the only one in common for all three Linux distributions,
that doesn't have local hacks, with be EXT2FS.  FreeBSD can read
and write EXT2FS, as long as you aren't using local hacks (last
time I checked this, a long time ago, I admit, FreeBSD did not
support the RedHat hack for sparse superblocks, and neither did
Debian).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13  4: 8:18 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from mx7.mail.ru (mx7.mail.ru [194.67.57.17])
	by hub.freebsd.org (Postfix) with ESMTP id 77A1537B416
	for <freebsd-fs@freebsd.org>; Wed, 13 Mar 2002 04:08:15 -0800 (PST)
Received: from f9.int ([10.0.0.77] helo=f9.mail.ru)
	by mx7.mail.ru with esmtp (Exim MX.7)
	id 16l7YE-000K4I-00
	for freebsd-fs@freebsd.org; Wed, 13 Mar 2002 15:08:14 +0300
Received: from mail by f9.mail.ru with local (Exim FE.9)
	id 16l7YD-0001FG-00
	for freebsd-fs@FreeBSD.org; Wed, 13 Mar 2002 15:08:13 +0300
Received: from [144.16.67.8] by eng.mail.ru with HTTP;
	Wed, 13 Mar 2002 15:08:13 +0300
From: "Parity Error" <bootup@mail.ru>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: metadata update durability ordering/soft updates
Mime-Version: 1.0
X-Mailer: mPOP Web-Mail 2.19
X-Originating-IP: 144.16.67.147 via proxy [144.16.67.8]
Date: Wed, 13 Mar 2002 15:08:13 +0300
Reply-To: "Parity Error" <bootup@mail.ru>
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 8bit
Message-Id: <E16l7YD-0001FG-00@f9.mail.ru>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

with soft-updates metadata updates are delayed write. I am wondering if, say
there 
are two independent structural changes, one after another, and then a crash
happens. 
Is there a possibility that the latter structural change got written to disk
before the 
former due to some memory replacement policy ?

could this affect the correctness of some applications ?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13  9: 7:37 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by hub.freebsd.org (Postfix) with ESMTP id 75A8D37B416
	for <freebsd-fs@FreeBSD.org>; Wed, 13 Mar 2002 09:07:29 -0800 (PST)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id 03582AE24A; Wed, 13 Mar 2002 09:07:29 -0800 (PST)
Date: Wed, 13 Mar 2002 09:07:28 -0800
From: Alfred Perlstein <bright@mu.org>
To: Parity Error <bootup@mail.ru>
Cc: freebsd-fs@FreeBSD.org
Subject: Re: metadata update durability ordering/soft updates
Message-ID: <20020313170728.GM32410@elvis.mu.org>
References: <E16l7YD-0001FG-00@f9.mail.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E16l7YD-0001FG-00@f9.mail.ru>
User-Agent: Mutt/1.3.27i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

* Parity Error <bootup@mail.ru> [020313 04:08] wrote:
> with soft-updates metadata updates are delayed write. I am wondering if, say
> there 
> are two independent structural changes, one after another, and then a crash
> happens. 
> Is there a possibility that the latter structural change got written to disk
> before the 
> former due to some memory replacement policy ?
> 
> could this affect the correctness of some applications ?

Of course!  This happens with almost any filesystem.

This is why you have fsync(2).

-Alfred

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13  9:59:24 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 48B4C37B41A; Wed, 13 Mar 2002 09:59:08 -0800 (PST)
Received: from mail.wolves.k12.mo.us (cdillon@mail.wolves.k12.mo.us [207.160.214.1])
	by mail.wolves.k12.mo.us (8.9.3/8.9.3) with ESMTP id LAA35159;
	Wed, 13 Mar 2002 11:59:07 -0600 (CST)
	(envelope-from cdillon@wolves.k12.mo.us)
Date: Wed, 13 Mar 2002 11:59:06 -0600 (CST)
From: Chris Dillon <cdillon@wolves.k12.mo.us>
To: <freebsd-scsi@freebsd.org>
Cc: <freebsd-fs@freebsd.org>
Subject: CD-MRW a.k.a Mt. Rainier support
Message-ID: <Pine.BSF.4.32.0203131116220.31162-100000@mail.wolves.k12.mo.us>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


CC'd to freebsd-fs since this is somewhat fs-related...

Is anyone working on implementing support for CD-MRW (apparently
included in MMC-3) into either the SCSI cd driver or the ATAPI cd
driver?  Where/how would be the best place to implement this so that
it will work with either ATAPI or SCSI drives?  Would implementing it
in the SCSI cd driver be best, since we now have the option of using
ATAPI drives with CAM?

In case anyone is wondering what CD-MRW (Mt. Rainier Re-Writable) is,
it is a new standard (currently only available in the Yamaha CRW3200
series, that I know of), that allows on-the-fly transparent
formatting, hardware defect management, and 2K-block logical
addressing of CD-RW discs and specifies a specialized UDF filesystem
to be used along with these hardware abilities.  This will make drives
supporting this standard act like a more traditional magnetic-media
removable drive, thus greatly simplifying reading/writing to CD-RW
discs.  Since MRW uses a new format it is not backwards compatible
with any existing CD-RW formats, though it is possible to _read_ a MRW
formatted disc in a regular drive with the proper software support.
MRW uses UDF as its standard filesystem, which we do not yet support,
though I envision using the hardware MRW support of the drive to put
just about anything you want onto it, including FAT or UFS, to use it
as a "regular" drive.

I'd love to take a shot at implementing this if someone isn't already,
though I'll need to find the specs for the hardware side of Mt.
Rainier.  Apprently it is implemented in the new MMC-3 command set.
Anyone have any pointers?

--
 Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
 FreeBSD: The fastest and most stable server OS on the planet
 - Available for IA32 (Intel x86) and Alpha architectures
 - IA64, PowerPC, UltraSPARC, and ARM architectures under development
 - http://www.freebsd.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13 10: 7:41 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from melchior.cuivre.fr.eu.org (melchior.enst.fr [137.194.161.6])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2F0F237B400; Wed, 13 Mar 2002 10:07:37 -0800 (PST)
Received: from melusine.cuivre.fr.eu.org (melusine.enst.fr [137.194.160.34])
	by melchior.cuivre.fr.eu.org (Postfix) with ESMTP
	id 86BDC8567; Wed, 13 Mar 2002 19:07:34 +0100 (CET)
Received: by melusine.cuivre.fr.eu.org (Postfix, from userid 1000)
	id 43A4D2C3D2; Wed, 13 Mar 2002 19:07:18 +0100 (CET)
Date: Wed, 13 Mar 2002 19:07:18 +0100
From: Thomas Quinot <thomas@cuivre.fr.eu.org>
To: Chris Dillon <cdillon@wolves.k12.mo.us>
Cc: freebsd-scsi@freebsd.org, freebsd-fs@freebsd.org
Subject: Re: CD-MRW a.k.a Mt. Rainier support
Message-ID: <20020313190718.A3239@melusine.cuivre.fr.eu.org>
Reply-To: thomas@cuivre.fr.eu.org
References: <Pine.BSF.4.32.0203131116220.31162-100000@mail.wolves.k12.mo.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.32.0203131116220.31162-100000@mail.wolves.k12.mo.us>; from cdillon@wolves.k12.mo.us on Wed, Mar 13, 2002 at 11:59:06AM -0600
X-message-flag: WARNING! Using Outlook can damage your computer.
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Le 2002-03-13, Chris Dillon écrivait :

> it will work with either ATAPI or SCSI drives?  Would implementing it
> in the SCSI cd driver be best, since we now have the option of using
> ATAPI drives with CAM?

I'd say implement in the SCSI cd driver, because this option allows
you to support both proper SCSI devices and ATAPI units without
duplicated code :).

Thomas.

-- 
    Thomas.Quinot@Cuivre.FR.EU.ORG

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13 11: 1:18 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id E54B337B400
	for <freebsd-fs@freebsd.org>; Wed, 13 Mar 2002 11:01:12 -0800 (PST)
Received: from pool0082.cvx21-bradley.dialup.earthlink.net ([209.179.192.82] helo=mindspring.com)
	by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lDzq-0006V8-00; Wed, 13 Mar 2002 11:01:10 -0800
Message-ID: <3C8FA1E4.A89F52FF@mindspring.com>
Date: Wed, 13 Mar 2002 11:00:52 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Parity Error <bootup@mail.ru>
Cc: freebsd-fs@FreeBSD.org
Subject: Re: metadata update durability ordering/soft updates
References: <E16l7YD-0001FG-00@f9.mail.ru>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Parity Error wrote:
> with soft-updates metadata updates are delayed write. I am
> wondering if, say there are two independent structural changes,
> one after another, and then a crash happens.
> 
> Is there a possibility that the latter structural change got
> written to disk before the former due to some memory replacement
> policy ?

Independent writes are independent, by definition.  They
are permitted to occur in either order.  Metadata updates
are only ordered by soft updates insofar as necessary to
satify dependencies.  Thus indepependent writes can occur
in any order, but will *usually* occur in order, due to
the way that a scheduled write can not be reordered once it
is given to the disk controller.

This is due to a locking issue on the disk operations queue
in the driver, and is arguably a bug.  It's likely that some
work currently in progress will forceed to the point that the
"likely ordering" of independent operations will "go away in
the future, so you can't even safely depend on it being likely.

This is normally an issue only for updates that do things
like update both an index and a record file, and imply a
dependency order in the operation.  In other words, there
is implied metadata between the two files, and therefore an
implied dependency.

It's the application's responsibility to signal the dependency
to the OS, so that the updates are ordered.  The normal way to
do this is to use a two stage commit operation (per standard
database theoury, Circa IBM, 1965).  In UNIX this is done by
requesting that the first operation be committed, before making
the request to begin the second operation (e.g. a software
barrier instruction).  To find out more about this, you should
use "man fsync" and "man open" (in the "open" page, look for
"O_FSYNC").


As to misordering of dependent writes, even if you use
synchronous I/O properly...

Yes, this can happen due to the memory replacement policy
on many IDE hard drives, which lie about data having been
committed to stable storage, when in fact it has only been
written to the disk write cache, which is far from stable
storage, being as it's not battery backed, and it is not
guaranteed to be written to the disk after a power failure,
except on some IBM and Quantum drives which are no longer
manufactured.

You can ensure this doesn't happen to you by using only
disks which can correctly support cache flush primitives
and tagged command queues, or disabling write caching on
the device.  SCSI devices don't have this problem.

Another potential problem is that some IDE  disks will
acknowledge disabling write caching, but will in fact not
disable it, no matter what commands you spit at them.  For
some of these disks, there are firmware updates available,
but if you are unlucky enough to own one of these disks,
then there is usually no option but to buy a good disk
instead.  May I recommend SCSI?


> could this affect the correctness of some applications ?

The disk caching issue could.  The implied metadata could
not.

If you have an application that uses implied metadata, but
does not take the necessary steps for UNIX to ensure that
the OS is signalled about the implied ordering dependency,
then by definition, your application can't have it's
correctness effected... since it has no correctness to lose.

8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Mar 13 11: 5:17 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id 79DEE37B41B
	for <freebsd-fs@freebsd.org>; Wed, 13 Mar 2002 11:05:10 -0800 (PST)
Received: from pool0082.cvx21-bradley.dialup.earthlink.net ([209.179.192.82] helo=mindspring.com)
	by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lE3e-0004nt-00; Wed, 13 Mar 2002 11:05:06 -0800
Message-ID: <3C8FA2D0.4542C198@mindspring.com>
Date: Wed, 13 Mar 2002 11:04:48 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.org
Subject: Re: metadata update durability ordering/soft updates
References: <E16l7YD-0001FG-00@f9.mail.ru> <3C8FA1E4.A89F52FF@mindspring.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Ugh.  Being dyslexic sucks.

Terry Lambert wrote:
[ ... ]
> work currently in progress will forceed to the point that the
                                 *proceed*
[ ... ]
> database theoury, Circa IBM, 1965).  In UNIX this is done by
           *theory*

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Mar 14  1:36:13 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from mx8.mail.ru (mx8.mail.ru [194.67.57.18])
	by hub.freebsd.org (Postfix) with ESMTP id CA68537B423
	for <freebsd-fs@freebsd.org>; Thu, 14 Mar 2002 01:35:56 -0800 (PST)
Received: from f10.int ([10.0.0.78] helo=f10.mail.ru)
	by mx8.mail.ru with esmtp (Exim MX.8)
	id 16lRax-0005sk-00; Thu, 14 Mar 2002 12:32:23 +0300
Received: from mail by f10.mail.ru with local (Exim FE.10)
	id 16lReK-000C3T-00; Thu, 14 Mar 2002 12:35:52 +0300
Received: from [144.16.67.8] by eng.mail.ru with HTTP;
	Thu, 14 Mar 2002 12:35:52 +0300
From: "Parity Error" <bootup@mail.ru>
To: "Terry Lambert" <tlambert2@mindspring.com>
Cc: freebsd-fs@FreeBSD.org
Subject: Re[2]: metadata update durability ordering/soft updates
Mime-Version: 1.0
X-Mailer: mPOP Web-Mail 2.19
X-Originating-IP: 144.16.67.147 via proxy [144.16.67.8]
Date: Thu, 14 Mar 2002 12:35:52 +0300
In-Reply-To: <3C8FA1E4.A89F52FF@mindspring.com>
Reply-To: "Parity Error" <bootup@mail.ru>
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 8bit
Message-Id: <E16lReK-000C3T-00@f10.mail.ru>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

i am referring not to file data, but filesystem metadata, which is now
_delayed_ write.
When we did synch write to sequence multiple metadata updates belonging to one 
operation for ensuring recoverability of that one operation, we also got
inter-operation
ordering for free (and apps/users could have started depending on it) . Unix
provides 
no guarantess reg the order in which file data will become stable, and apps
should use 
fsync/O_SYNC or logging or whatever to ensure the consistency of their data
stores. 

But, the ordering in which different metadata operations becomes stables, if
not 
enforced could result in the following scenario. 

md a
touch a/file{0,1}{0,1}{0,1}{0,1}
md a/b
touch a/b/file{0,1}{0,1}{0,1}{0,1}

< a crash happens sometime later >

after recovery, it could turn out that all of a/b/file* is there, but only a
few of a/file* are 
there (possibly those in the first dir block). These kind of things would not
occur when 
we did synch write of metadata (disk scheduling would not affect this). unlink
could 
possibly produce even more dramatic effects.  Now the question is whether this
kind of 
behaviour from the filesystem is acceptable and whether some applications can 
actually fail badly due to this.


-----Original Message-----
From: Terry Lambert <tlambert2@mindspring.com>
To: Parity Error <bootup@mail.ru>
Date: Wed, 13 Mar 2002 11:00:52 -0800
Subject: Re: metadata update durability ordering/soft updates


Parity Error wrote:
> with soft-updates metadata updates are delayed write. I am
> wondering if, say there are two independent structural changes,
> one after another, and then a crash happens.
> 
> Is there a possibility that the latter structural change got
> written to disk before the former due to some memory replacement
> policy ?

Independent writes are independent, by definition.  They
are permitted to occur in either order.  Metadata updates
are only ordered by soft updates insofar as necessary to
satify dependencies.  Thus indepependent writes can occur
in any order, but will *usually* occur in order, due to
the way that a scheduled write can not be reordered once it
is given to the disk controller.

This is due to a locking issue on the disk operations queue
in the driver, and is arguably a bug.  It's likely that some
work currently in progress will forceed to the point that the
"likely ordering" of independent operations will "go away in
the future, so you can't even safely depend on it being likely.

This is normally an issue only for updates that do things
like update both an index and a record file, and imply a
dependency order in the operation.  In other words, there
is implied metadata between the two files, and therefore an
implied dependency.

It's the application's responsibility to signal the dependency
to the OS, so that the updates are ordered.  The normal way to
do this is to use a two stage commit operation (per standard
database theoury, Circa IBM, 1965).  In UNIX this is done by
requesting that the first operation be committed, before making
the request to begin the second operation (e.g. a software
barrier instruction).  To find out more about this, you should
use "man fsync" and "man open" (in the "open" page, look for
"O_FSYNC").


As to misordering of dependent writes, even if you use
synchronous I/O properly...

Yes, this can happen due to the memory replacement policy
on many IDE hard drives, which lie about data having been
committed to stable storage, when in fact it has only been
written to the disk write cache, which is far from stable
storage, being as it's not battery backed, and it is not
guaranteed to be written to the disk after a power failure,
except on some IBM and Quantum drives which are no longer
manufactured.

You can ensure this doesn't happen to you by using only
disks which can correctly support cache flush primitives
and tagged command queues, or disabling write caching on
the device.  SCSI devices don't have this problem.

Another potential problem is that some IDE  disks will
acknowledge disabling write caching, but will in fact not
disable it, no matter what commands you spit at them.  For
some of these disks, there are firmware updates available,
but if you are unlucky enough to own one of these disks,
then there is usually no option but to buy a good disk
instead.  May I recommend SCSI?


> could this affect the correctness of some applications ?

The disk caching issue could.  The implied metadata could
not.

If you have an application that uses implied metadata, but
does not take the necessary steps for UNIX to ensure that
the OS is signalled about the implied ordering dependency,
then by definition, your application can't have it's
correctness effected... since it has no correctness to lose.

8-).

-- Terry


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Mar 14  9:32:32 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id 7D2A437B404
	for <freebsd-fs@FreeBSD.ORG>; Thu, 14 Mar 2002 09:32:29 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2EHWJg29073;
	Thu, 14 Mar 2002 18:32:19 +0100
Date: Thu, 14 Mar 2002 18:32:19 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
Message-ID: <20020314183219.A28415@caldera.de>
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C8E72A3.6E9CBC6F@mindspring.com>; from tlambert2@mindspring.com on Tue, Mar 12, 2002 at 01:26:59PM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Tue, Mar 12, 2002 at 01:26:59PM -0800, Terry Lambert wrote:
> It's really hard to answer these kinds of questions exhaustively,
> since Linux has the bad habit of changing things about the on
> disk layout of FS data, and not changing the name of the FS;
> there are at least six incompatible hacks on EXT2FS since the
> first EXT2FS, and knowing which one you have is an exercise in
> detective work.
>
> [snip]

Terry,

could you _please_ check the facts before you are going to tell the
world fs myth over and over?

Unlike FFS/UFS which has at least a dozen incompatible derivates ext2
was designed with extensibility in mind.  If you would care to actually
look at the ext2 superblock definition you would notice two things:

 1) a revision level (s_rev_level)
 2) 3 feature flags (s_feature_compat, s_feature_incompat,
 	s_feature_ro_compat)

The first is used for global filesystem revisioning and so far has only
two allowed values, EXT2_GOOD_OLD_REV for very very old filesystems from
Linux 0.x days and EXT2_DYNAMIC_REV which is used for any current
filesystem.  The feature flags (which did not exist in EXT2_GOOD_OLD_REV)
allow fine-graded and backwards compatible extension to the filesystem
layout without messing up other implementation like it has happened with
UFS.  The first set of flags, called compatible are extensions that can
be ignored by implementation that do not know about them, they have a
meaning only for fsck, examples are directory preallocation or the
presence of a journal inode for the Linux 'ext3' driver.  The second
set, called 'ro_compat' is for layout changes that can be mounted r/o
by old drivers, an example are the sparse superblocks introduced in
Linux 2.2's ext2 driver.  The third set is for layout changes that need
support from the driver for both reading and writing, examples is
the 4.4BSD-style dirent layout ext2 can use optionally or and filesystem
with an unrecovered log written by the Linux ext3 driver.


> I think the only one in common for all three Linux distributions,
> that doesn't have local hacks, with be EXT2FS.  FreeBSD can read
> and write EXT2FS, as long as you aren't using local hacks (last
> time I checked this, a long time ago, I admit, FreeBSD did not
> support the RedHat hack for sparse superblocks, and neither did
> Debian).

Sparse superblocks is a feature introduced in Linux 2.2 and thus
supported by all Linux distributions having 2.2 or newer kernels
(including Debian potatoe/woody!), which was also backported to 2.0
and included in 2.0.39.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Mar 14 12:48: 0 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84])
	by hub.freebsd.org (Postfix) with ESMTP id 0C83F37B402
	for <freebsd-fs@freebsd.org>; Thu, 14 Mar 2002 12:47:50 -0800 (PST)
Received: from pool0226.cvx22-bradley.dialup.earthlink.net ([209.179.198.226] helo=mindspring.com)
	by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lc8O-0001xa-00; Thu, 14 Mar 2002 12:47:36 -0800
Message-ID: <3C910C57.71C2D823@mindspring.com>
Date: Thu, 14 Mar 2002 12:47:19 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Parity Error <bootup@mail.ru>
Cc: freebsd-fs@FreeBSD.org
Subject: Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Parity Error wrote:
> i am referring not to file data, but filesystem metadata, which
> is now _delayed_ write.

I understand this.  Do you understand that delaying the metatadata
writes in soft updates does not affect the dependency ordering, but
may affect the time ordering?

If I have two dependent lists of operations, A-B-C and D-B-E,
then I am ony guaranteed that A and D will occur before B,
and C andc E will occur after B, but there is no guarantee on
the order of [A,D] vs. [D,A] or [C,E] vs. [E,C].

If I have to OTHER dependent lists of operations, Q-R and S-T,
then I am only guaranteed that Q will occur before R, and S
will occur before T, but there is no guarantee on the order of
[ [Q,S], [Q,T], [R,S], [R,T] ] vs. [ [S,Q], [T,Q], [S,R], [T,R] ];
Q-R-S-T is a valid order, as is S-T-Q-R, as is [Q-S-T-R], as is
[Q-S-R-T], etc..

> When we did synch write to sequence multiple metadata updates
> belonging to one operation for ensuring recoverability of that
> one operation, we also got inter-operation ordering for free

Yes.

> (and apps/users could have started depending on it) .

No.  Only misinformed users.  The system *never* made *any*
guarantees with regard to implied metadata.  Your statement
"multiple metadata updates belonging to one operation" is
bogus.  There is no such thing as "one operation" in this
context.  Multiple metadata updates are multiple operations,
and the filesystem guarantees are only that the operations
will not return to the user until they have completed in
the guaranteed order, not that they have completed in any
time relative order compared to each other.


> Unix provides no guarantess reg the order in which file data
> will become stable, and apps should use fsync/O_SYNC or logging
> or whatever to ensure the consistency of their data stores.

That's nice, but it's irrelevant to this discussion, since
file data was never guaranteed for write anyway.

THe reason the fsync/O_SYNC work to serialize the metadata
operations is that the operations are guaranteed to occur
using synchronous I/O, before they return.

In other words, they are stall barriers instituted by the
application programmer in order to get the behaviour the
users ..."could have started depending on"... on purpose,
rather than getting it as a result of an accident of the
implementation of the underlying primitives.

> But, the ordering in which different metadata operations becomes
> stables, if not enforced could result in the following scenario.

[ ... demonstration of failure of bogus assumptions ... ]

Yes.  Bogus assumptions are bogus.  That's a circular argument.
One must not make bogus assumptions, if one wants one's code
to operate reliably.

Your example is poor, as well, unless you intended the "touch"
operations to occur concurrently.


>  These kind of things would not occur when we did synch write of
> metadata (disk scheduling would not affect this). unlink could
> possibly produce even more dramatic effects.  Now the question is
> whether this kind of behaviour from the filesystem is acceptable
> and whether some applications can actually fail badly due to this.

A1: The behaviour is acceptable, since the behaviour guarantees
for metadata stability are mandated by operational guarantees.

To boils this down to laymans language: the OS provides a set of
services upon which reliable services can be built, if they are
correctly engineered.  It is up to the people building the layers
of services on top of the OS services to provide those facilities
that do not exist within the OS proper, such that they are reliable.

In other words, the purpose of the OS is to provide an unconstrained
foundation.  So long as you don't mount the FS in such a way that
the metadata updates are not carried out in the correct order, (e.g.
async), then you can create a system in which the ordering guarantees
are maintained from end-to-end, and you can reliably know the state
that you would have been in had you not crashed, following a crash,
and can recover by rolling the operation forward, if all necessary
data is available, or backward, if it is not.


A2: Applications which expect behaviour other than that guaranteed
by the API definitions can be expected to fail badly when their
assumptions are proven to be unfounded in reality.


STANDARDS COMPLIANCE AND METADATA UPDATES, WITH A SURVEY OF OS/FS's

Certaint metadata updates, such as those to ctime, mtime, and
atime, are guaranteed by the POSIX standard.  These, in turn, imply
that the containers for these objects are similarly guaranteed, to
the root operation, such that the guaranteed operations are always
reliable.  Any OS which fails to make these guarantees is, by its
definition, non-compliant with POSIX.

You can intentionally choose to operate certain filesystems in a
POSIX-non-compliant mode; for example, you can use an MFS, or you
can mount a filesystem async, such that metatadata update guarantees
required for conformance to the standard are not observed.  But you
knowingly give up standards compliance when you do this.

For example, Linux running EXT2FS mounted asynchronously fails
to comply with the POSIX standard with regard to update of ctime,
atime, and mtime updates, both because of the direct failure for
such updates to be committed to stable storage, and because of the
indirect failure of the updates to be committed, since the containers
are not committed, thus making the containers in which the commits
are taking place fail to comply with the definition of "stable
storage".

Another example would be FreeBSD running FFS, if you went out of the
way to mount it async, rather than sync (or with more recent
installations, with soft updates).  Similarly, mounting it noatime
also fails this test.

If you were to mount a System V UFS in SVR4.2 by default, without
specifying "sync" or "async", then you get a behaviour called DOW
(Delayed Ordered Writes), in which an intentionally stall point is
inserted between dependeny convergences.  THis is similar to soft
updates, in that the stall point requires synchronization of the
stable storage at the point where the intersection would occur, but
it provides only non-commutability on non-commutable operations in
a given edge, and does not permit reordering of associativity, even
though operations are associative, and effeciency might be gained,
thereby.  Thus the original A-B-C, D-B-E operation actually *must*
occur in A-B B-E ordering, with a stall between the "B" and the "B".
This only coincidently makes a *partial* ordering guarantee on the
order of independent metadata updates -- so even here, you can not
rely on the system ordering independent updates, only on it being
standards compliant in the API guarantees.

If you want this behaviour on Linux, ReiserFS uses the USL patented
DOW technology without a license.  If you are outside the US, and
don't plan on selling into the US until at least 2018, you could
use ReiserFS to get metadata update ordering withing standards
guaranteed operations, and it will only stall out as often as the
SVR4.2 UFS with DOW.  But you will have the same problem with your
software that assumes -- incorrectly -- that serially requested
independent metadata updates will take place serially... when, in
fact, there is no such guarantee.

PS: FWIW, it's *possible* to generalize the soft updates mechanism
to export a transactioning interface -- actually, a dependency edge
that can be used to implement transactioning -- to user space.  The
effect of doing this would be to also export an edge of the dependency
graph upward.  For two independent graphs, implying an edge between
the top nodes establishes a precedence order on completion, and
therefore guarantees ordering of operations within a transaction.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Mar 14 13:24: 8 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP id 33E2437B416
	for <freebsd-fs@freebsd.org>; Thu, 14 Mar 2002 13:24:04 -0800 (PST)
Received: from pool0226.cvx22-bradley.dialup.earthlink.net ([209.179.198.226] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lchY-0006th-00; Thu, 14 Mar 2002 13:23:56 -0800
Message-ID: <3C9114DA.5A2D0591@mindspring.com>
Date: Thu, 14 Mar 2002 13:23:38 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Christoph Hellwig <hch@caldera.de>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Christoph Hellwig wrote:
> On Tue, Mar 12, 2002 at 01:26:59PM -0800, Terry Lambert wrote:
> > It's really hard to answer these kinds of questions exhaustively,
> > since Linux has the bad habit of changing things about the on
> > disk layout of FS data, and not changing the name of the FS;
> > there are at least six incompatible hacks on EXT2FS since the
> > first EXT2FS, and knowing which one you have is an exercise in
> > detective work.
> > [snip]
> 
> Terry,
> 
> could you _please_ check the facts before you are going to tell the
> world fs myth over and over?
> 
> Unlike FFS/UFS which has at least a dozen incompatible derivates ext2
> was designed with extensibility in mind.  If you would care to actually
> look at the ext2 superblock definition you would notice two things:
> 
>  1) a revision level (s_rev_level)
>  2) 3 feature flags (s_feature_compat, s_feature_incompat,
>         s_feature_ro_compat)

I am aware of this.

Perhaps, since you are knowledgeable in the Linux EXT2FS area,
you can answer the rest of the original question, now that I've
narrowed the answer to "some version of EXT2FS"?

--

What is the highest revision level, and what are the maximum
feature flags that one can use interoperably between versions
of RedHat, FreeBSD, Debian, and Mandrake?

> Sparse superblocks is a feature introduced in Linux 2.2 and thus
> supported by all Linux distributions having 2.2 or newer kernels
> (including Debian potatoe/woody!), which was also backported to 2.0
> and included in 2.0.39.

Sorry; he did not specify the version of Debian he was using,
or the version of RedHat or Mandrake, or even FreeBSD for that
matter).

I'm guessing it will have to be "lowest revision" and "no feature
flags".


PS: Different versions of FFS have different magic numbers;
the original number was Kirk's birthday.

PPS: I'm more of an FFS maven than an EXT2FS maven; so I would
be more likely to be able to tell you about FFS interoperability
between systems.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  0:34:43 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.fidnet.com (two.fidnet.com [216.229.64.72])
	by hub.freebsd.org (Postfix) with SMTP id 68F7C37B419
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 00:34:35 -0800 (PST)
Received: (qmail 10788 invoked from network); 15 Mar 2002 08:34:34 -0000
Received: from beast.hexaneinc.com (HELO beast) (216.229.82.132)
  by two.fidnet.com with SMTP; 15 Mar 2002 08:34:34 -0000
From: "Matthew Rezny" <mrezny@umr.edu>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Date: Fri, 15 Mar 2002 02:35:31 -0600
Reply-To: "Matthew Rezny" <mrezny@umr.edu>
X-Mailer: PMMail 2000 Professional (2.10.2010) For Windows 2000 (5.0.2195;2)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Subject: disks > 1TB
Message-Id: <20020315083435.68F7C37B419@hub.freebsd.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

I just bought a 3ware 7810 controller and 8 160GB drives, which in
RAID5 yields 1.04TB (real TB). Having previously seen statements that
FFS limit is 64TB, I expected this to work. Unfortunately I found that
the number of sectors becomes an issue. Looking through the mailing
list history I see this has come up before and it will take a lot to
solve, more than the spare time I have this weekend. The quick solution
is make a 1TB filesystem and let the extra .04TB go to waste rather
than try to patch the whole system. However, there is a slight problem
with this, which is limits in the disklabel tool. The disklabel
structure which is stored on disk uses u_int32_t for the number of
sectors in the device. The disklabel tool uses int when interpretting
all numbers in the getasciilabel() function. This limits disklabel to
1TB devices. If the declaration on line 964 of disklabel.c is changed
from "int v" to "u_int32_t v" then this limit is lifted. This change is
safe since the actual value on disk is unsigned. Using unsigned in the
input allow disklabel to work with devices up to 2TB. This allows
creation of 1TB slices on devices >1TB so that at least part can be
used in the meantime while we wait for the limit to be lifted elsewhere
in the system.

Also, I've seen one mention of 4TB systems in the mailing list
archives. How was this done? Kernel patches, other trickery?



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  0:58:37 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from swan.prod.itd.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123])
	by hub.freebsd.org (Postfix) with ESMTP id 0811B37B400
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 00:58:34 -0800 (PST)
Received: from pool0072.cvx40-bradley.dialup.earthlink.net ([216.244.42.72] helo=mindspring.com)
	by swan.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lnXk-0000T2-00; Fri, 15 Mar 2002 00:58:33 -0800
Message-ID: <3C91B78A.686279D0@mindspring.com>
Date: Fri, 15 Mar 2002 00:57:46 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Matthew Rezny <mrezny@umr.edu>
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: disks > 1TB
References: <20020315083435.68F7C37B419@hub.freebsd.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Matthew Rezny wrote:
> Also, I've seen one mention of 4TB systems in the mailing list
> archives. How was this done? Kernel patches, other trickery?

You can just put the FS on a raw device, without using a
disklabel.  Thus the disklabel limits don't come into play,
though it does limit you to one FS per device.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  1:51: 3 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from pop3.psconsult.nl (ps226.psconsult.nl [193.67.147.226])
	by hub.freebsd.org (Postfix) with ESMTP id 67B9737B400
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 01:50:58 -0800 (PST)
Received: (from paul@localhost)
	by pop3.psconsult.nl (8.9.2/8.9.2) id KAA79898;
	Fri, 15 Mar 2002 10:48:16 +0100 (CET)
	(envelope-from paul)
Date: Fri, 15 Mar 2002 10:48:16 +0100
From: Paul Schenkeveld <fb-fs@psconsult.nl>
To: Matthew Rezny <mrezny@umr.edu>
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG>
Subject: Re: disks > 1TB
Message-ID: <20020315104815.A79816@psconsult.nl>
References: <20020315083435.68F7C37B419@hub.freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <20020315083435.68F7C37B419@hub.freebsd.org>; from mrezny@umr.edu on Fri, Mar 15, 2002 at 02:35:31AM -0600
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Mar 15, 2002 at 02:35:31AM -0600, Matthew Rezny wrote:
> I just bought a 3ware 7810 controller and 8 160GB drives, which in
> RAID5 yields 1.04TB (real TB). Having previously seen statements that
> FFS limit is 64TB, I expected this to work. Unfortunately I found that
> the number of sectors becomes an issue. Looking through the mailing
> list history I see this has come up before and it will take a lot to
> solve, more than the spare time I have this weekend. The quick solution
> is make a 1TB filesystem and let the extra .04TB go to waste rather
> than try to patch the whole system. However, there is a slight problem
> with this, which is limits in the disklabel tool. The disklabel
> structure which is stored on disk uses u_int32_t for the number of
> sectors in the device. The disklabel tool uses int when interpretting
> all numbers in the getasciilabel() function. This limits disklabel to
> 1TB devices. If the declaration on line 964 of disklabel.c is changed
> from "int v" to "u_int32_t v" then this limit is lifted. This change is
> safe since the actual value on disk is unsigned. Using unsigned in the
> input allow disklabel to work with devices up to 2TB. This allows
> creation of 1TB slices on devices >1TB so that at least part can be
> used in the meantime while we wait for the limit to be lifted elsewhere
> in the system.

Did you try to divide the disk in two FreeBSD slices using fdisk?
The numbers in disklabel are relative to the fdisk slice so your
xx0s1c partition is the same size as the fdisk slice.

> Also, I've seen one mention of 4TB systems in the mailing list
> archives. How was this done? Kernel patches, other trickery?

-- 
Paul Schenkeveld, Consultant
PSconsult ICT Services BV

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  2: 1:14 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id 2E18D37B404
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 02:01:09 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2FA0xS32700;
	Fri, 15 Mar 2002 11:00:59 +0100
Date: Fri, 15 Mar 2002 11:00:59 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
Message-ID: <20020315110059.A32509@caldera.de>
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C9114DA.5A2D0591@mindspring.com>; from tlambert2@mindspring.com on Thu, Mar 14, 2002 at 01:23:38PM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Thu, Mar 14, 2002 at 01:23:38PM -0800, Terry Lambert wrote:
> Perhaps, since you are knowledgeable in the Linux EXT2FS area,
> you can answer the rest of the original question, now that I've
> narrowed the answer to "some version of EXT2FS"?
> 
> --
> 
> What is the highest revision level, and what are the maximum
> feature flags that one can use interoperably between versions
> of RedHat, FreeBSD, Debian, and Mandrake?

I don't have all those Linux Distributions handy, but as the feature
flags didn't change inbetween of Linux 2.2/2.4 release I'll just use
generic 2.2/2.4 Kernels.

Linux 2.4.18 [ext3 driver] (include/linux/ext3_fs.h):
#define EXT3_FEATURE_COMPAT_SUPP	0
#define EXT3_FEATURE_INCOMPAT_SUPP	(EXT3_FEATURE_INCOMPAT_FILETYPE| \
					 EXT3_FEATURE_INCOMPAT_RECOVER)
#define EXT3_FEATURE_RO_COMPAT_SUPP	(EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER| \
					 EXT3_FEATURE_RO_COMPAT_LARGE_FILE| \
					 EXT3_FEATURE_RO_COMPAT_BTREE_DIR)

Linux 2.4.18 (include/linux/ext2_fs.h):
#define EXT2_FEATURE_COMPAT_SUPP	0
#define EXT2_FEATURE_INCOMPAT_SUPP	EXT2_FEATURE_INCOMPAT_FILETYPE
#define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
					 EXT2_FEATURE_RO_COMPAT_BTREE_DIR)

Linux 2.2.18 (include/linux/ext2_fs.h):
#define EXT2_FEATURE_COMPAT_SUPP	0
#define EXT2_FEATURE_INCOMPAT_SUPP	EXT2_FEATURE_INCOMPAT_FILETYPE
#define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
					 EXT2_FEATURE_RO_COMPAT_BTREE_DIR)

FreeBSD-stable (sys/gnu/ext2fs/ext2_fs.h):
#define EXT2_FEATURE_COMPAT_SUPP	0
#define EXT2_FEATURE_INCOMPAT_SUPP	EXT2_FEATURE_INCOMPAT_FILETYPE
#ifdef notyet
#define EXT2_FEATURE_RO_COMPAT_SUPP	(EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \
					 EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \
					 EXT2_FEATURE_RO_COMPAT_BTREE_DIR)
#else
#define EXT2_FEATURE_RO_COMPAT_SUPP	EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER
#endif

So all support revision 1 filesystems, no compat flag, the incompatible
4.4BSD-style dirent and sparse superblocks. Linux 2.2/2.4 support
large files and compatiblity for the never released (!) btree directory
support.  The Linux ext3 driver also supports filesystems that need a
log replay - for other drivers this will already be cleared by a fsck run.

> PS: Different versions of FFS have different magic numbers;
> the original number was Kirk's birthday.

Only very few FFS derivates have different major numbers, infact I only
know of SVR4.2MP SFS and various HP versions.  On the other hand
Solaris/Solaris-i386/4.4BSD/OpenStep seem to have the same one and are
_very_ incompatible.

> PPS: I'm more of an FFS maven than an EXT2FS maven; so I would
> be more likely to be able to tell you about FFS interoperability
> between systems.

Thanks, I have enough of it after implementing SVR4.2MP UFS and SFS
support for Linux..

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  6:59:13 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251])
	by hub.freebsd.org (Postfix) with ESMTP id 0202B37B41B
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 06:56:59 -0800 (PST)
Received: (from jmacd@localhost)
	by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id GAA10649;
	Fri, 15 Mar 2002 06:56:51 -0800 (PST)
Message-ID: <20020315065651.02637@helen.CS.Berkeley.EDU>
Date: Fri, 15 Mar 2002 06:56:51 -0800
From: Josh MacDonald <jmacd@CS.Berkeley.EDU>
To: Terry Lambert <tlambert2@mindspring.com>,
	Parity Error <bootup@mail.ru>
Cc: freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com
Subject: Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89.1
In-Reply-To: <3C910C57.71C2D823@mindspring.com>; from Terry Lambert on Thu, Mar 14, 2002 at 12:47:19PM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Quoting Terry Lambert (tlambert2@mindspring.com):
> Parity Error wrote:
> > i am referring not to file data, but filesystem metadata, which
> > is now _delayed_ write.
> 
> I understand this.  Do you understand that delaying the metatadata
> writes in soft updates does not affect the dependency ordering, but
> may affect the time ordering?
> 
> If I have two dependent lists of operations, A-B-C and D-B-E,
> then I am ony guaranteed that A and D will occur before B,
> and C andc E will occur after B, but there is no guarantee on
> the order of [A,D] vs. [D,A] or [C,E] vs. [E,C].
> 
> If I have to OTHER dependent lists of operations, Q-R and S-T,
> then I am only guaranteed that Q will occur before R, and S
> will occur before T, but there is no guarantee on the order of
> [ [Q,S], [Q,T], [R,S], [R,T] ] vs. [ [S,Q], [T,Q], [S,R], [T,R] ];
> Q-R-S-T is a valid order, as is S-T-Q-R, as is [Q-S-T-R], as is
> [Q-S-R-T], etc..
> 
> > When we did synch write to sequence multiple metadata updates
> > belonging to one operation for ensuring recoverability of that
> > one operation, we also got inter-operation ordering for free
> 
> Yes.
> 
> > (and apps/users could have started depending on it) .
> 
> No.  Only misinformed users.  The system *never* made *any*
> guarantees with regard to implied metadata.  Your statement
> "multiple metadata updates belonging to one operation" is
> bogus.  There is no such thing as "one operation" in this
> context.  Multiple metadata updates are multiple operations,
> and the filesystem guarantees are only that the operations
> will not return to the user until they have completed in
> the guaranteed order, not that they have completed in any
> time relative order compared to each other.
> 
> 
> > Unix provides no guarantess reg the order in which file data
> > will become stable, and apps should use fsync/O_SYNC or logging
> > or whatever to ensure the consistency of their data stores.
> 
> That's nice, but it's irrelevant to this discussion, since
> file data was never guaranteed for write anyway.
> 
> THe reason the fsync/O_SYNC work to serialize the metadata
> operations is that the operations are guaranteed to occur
> using synchronous I/O, before they return.
> 
> In other words, they are stall barriers instituted by the
> application programmer in order to get the behaviour the
> users ..."could have started depending on"... on purpose,
> rather than getting it as a result of an accident of the
> implementation of the underlying primitives.
> 
> > But, the ordering in which different metadata operations becomes
> > stables, if not enforced could result in the following scenario.
> 
> [ ... demonstration of failure of bogus assumptions ... ]
> 
> Yes.  Bogus assumptions are bogus.  That's a circular argument.
> One must not make bogus assumptions, if one wants one's code
> to operate reliably.
> 
> Your example is poor, as well, unless you intended the "touch"
> operations to occur concurrently.
> 
> 
> >  These kind of things would not occur when we did synch write of
> > metadata (disk scheduling would not affect this). unlink could
> > possibly produce even more dramatic effects.  Now the question is
> > whether this kind of behaviour from the filesystem is acceptable
> > and whether some applications can actually fail badly due to this.
> 
> A1: The behaviour is acceptable, since the behaviour guarantees
> for metadata stability are mandated by operational guarantees.
> 
> To boils this down to laymans language: the OS provides a set of
> services upon which reliable services can be built, if they are
> correctly engineered.  It is up to the people building the layers
> of services on top of the OS services to provide those facilities
> that do not exist within the OS proper, such that they are reliable.
> 
> In other words, the purpose of the OS is to provide an unconstrained
> foundation.  So long as you don't mount the FS in such a way that
> the metadata updates are not carried out in the correct order, (e.g.
> async), then you can create a system in which the ordering guarantees
> are maintained from end-to-end, and you can reliably know the state
> that you would have been in had you not crashed, following a crash,
> and can recover by rolling the operation forward, if all necessary
> data is available, or backward, if it is not.
> 
> 
> A2: Applications which expect behaviour other than that guaranteed
> by the API definitions can be expected to fail badly when their
> assumptions are proven to be unfounded in reality.
> 
> 
> STANDARDS COMPLIANCE AND METADATA UPDATES, WITH A SURVEY OF OS/FS's
> 
> Certaint metadata updates, such as those to ctime, mtime, and
> atime, are guaranteed by the POSIX standard.  These, in turn, imply
> that the containers for these objects are similarly guaranteed, to
> the root operation, such that the guaranteed operations are always
> reliable.  Any OS which fails to make these guarantees is, by its
> definition, non-compliant with POSIX.
> 
> You can intentionally choose to operate certain filesystems in a
> POSIX-non-compliant mode; for example, you can use an MFS, or you
> can mount a filesystem async, such that metatadata update guarantees
> required for conformance to the standard are not observed.  But you
> knowingly give up standards compliance when you do this.
> 
> For example, Linux running EXT2FS mounted asynchronously fails
> to comply with the POSIX standard with regard to update of ctime,
> atime, and mtime updates, both because of the direct failure for
> such updates to be committed to stable storage, and because of the
> indirect failure of the updates to be committed, since the containers
> are not committed, thus making the containers in which the commits
> are taking place fail to comply with the definition of "stable
> storage".
> 
> Another example would be FreeBSD running FFS, if you went out of the
> way to mount it async, rather than sync (or with more recent
> installations, with soft updates).  Similarly, mounting it noatime
> also fails this test.
> 
> If you were to mount a System V UFS in SVR4.2 by default, without
> specifying "sync" or "async", then you get a behaviour called DOW
> (Delayed Ordered Writes), in which an intentionally stall point is
> inserted between dependeny convergences.  THis is similar to soft
> updates, in that the stall point requires synchronization of the
> stable storage at the point where the intersection would occur, but
> it provides only non-commutability on non-commutable operations in
> a given edge, and does not permit reordering of associativity, even
> though operations are associative, and effeciency might be gained,
> thereby.  Thus the original A-B-C, D-B-E operation actually *must*
> occur in A-B B-E ordering, with a stall between the "B" and the "B".
> This only coincidently makes a *partial* ordering guarantee on the
> order of independent metadata updates -- so even here, you can not
> rely on the system ordering independent updates, only on it being
> standards compliant in the API guarantees.
> 
> If you want this behaviour on Linux, ReiserFS uses the USL patented
> DOW technology without a license.  If you are outside the US, and
> don't plan on selling into the US until at least 2018, you could
> use ReiserFS to get metadata update ordering withing standards
> guaranteed operations, and it will only stall out as often as the
> SVR4.2 UFS with DOW.  But you will have the same problem with your
> software that assumes -- incorrectly -- that serially requested
> independent metadata updates will take place serially... when, in
> fact, there is no such guarantee.

Terry,

I'm not sure what you're talking about with regards to DOW and
ReiserFS.  It doesn't sound right, and I'm pretty sure we're not using
anything like the patented DOW technique as you've described it.

We are developing a transaction facility for many of the reasons
suggested at by the original post in this thread.

To summarize:

- The file system has never made any guarantees.

- You can use fsync() to stabilize a single file and its metadata
dependencies.

- You can use two-phase commit above and beyond that.

- If you're not doing the right thing, "then by definition, your
application can't have it's correctness effected... since it has no
correctness to lose."

- And, "the OS provides a set of services upon which reliable services
can be built, if they are correctly engineered."

All of these statements are true.  Your attitude seems to be that this
is a fine state of affairs, that anyone who writes an application
should be fully informed of all these "transactional" issues, and that
anyone who is not fully informed of all these issues is a complete
moron if they expect to write reliable applications.

The problem is that you're asking way to much of the average
programmer, who doesn't understand transactions and isn't aware of how
little the operating system actually guarantees in this regard.

The other problem is that fsync() and two-phase-commit can seriously
limit application performance, unless you use highly sophisticated
techniques, which again rules out the average programmer.

The fact is, it is very difficult to write "reliable services" on top
of the standard primitives, and it is not good enough to call people
morons if they don't understand this.

There is a document describing our transactions design for ReiserFS
version 4, which is currently under development:

   http://namesys.com/txn-doc.html

And somewhat off topic, I have demonstrated that using fsync() and
rename() as a means for reliable, atomic file updates can seriously
limit application performance and that having file system transactions
solves the problem.  My point is that applications will perform
better, not worse, if the operating system helps construct reliable
services instead of this do-it-yourself approach.

Master's thesis:

   http://prdownloads.sourceforge.net/xdelta/xdfs.pdf

and the graph that shows it all:

   http://www.cs.berkeley.edu/~jmacd/xdfs-vs-rcs.eps

Regards,

-josh

-- 
PRCS version control system    http://sourceforge.net/projects/prcs
Xdelta storage & transport     http://sourceforge.net/projects/xdelta
Need a concurrent skip list?   http://sourceforge.net/projects/skiplist

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  7:53:45 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id 2645D37B41F
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 07:53:35 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2FFrOL17729;
	Fri, 15 Mar 2002 16:53:24 +0100
Date: Fri, 15 Mar 2002 16:53:24 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG
Subject: Re: metadata update durability ordering/soft updates
Message-ID: <20020315165324.A17467@caldera.de>
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C910C57.71C2D823@mindspring.com>; from tlambert2@mindspring.com on Thu, Mar 14, 2002 at 12:47:19PM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Thu, Mar 14, 2002 at 12:47:19PM -0800, Terry Lambert wrote:
> If you want this behaviour on Linux, ReiserFS uses the USL patented
> DOW technology without a license.

Reiserfs is a typical journaling filesystem in that it writes logical
log records to either an inline log or (in recent versions) an extern
log device.  Ext3 uses physical block based journaling and allows
additional tracking of data blocks in a way only remotely similar to
DOW (the data=ordered mode).

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  7:58:32 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id 55E4637B402
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 07:58:28 -0800 (PST)
Received: from pool0389.cvx22-bradley.dialup.earthlink.net ([209.179.199.134] helo=mindspring.com)
	by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lu5u-0002zE-00; Fri, 15 Mar 2002 07:58:14 -0800
Message-ID: <3C921A04.CFCADA9D@mindspring.com>
Date: Fri, 15 Mar 2002 07:57:56 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Christoph Hellwig <hch@caldera.de>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Christoph Hellwig wrote:
> > PS: Different versions of FFS have different magic numbers;
> > the original number was Kirk's birthday.
> 
> Only very few FFS derivates have different major numbers, infact I only
> know of SVR4.2MP SFS and various HP versions.  On the other hand
> Solaris/Solaris-i386/4.4BSD/OpenStep seem to have the same one and are
> _very_ incompatible.

8-).  Common mistake.  They have opposite word order, so the
version number is different.  They also have different VTOC
and disklabel order, so they're easy to differentiate anyway.


> > PPS: I'm more of an FFS maven than an EXT2FS maven; so I would
> > be more likely to be able to tell you about FFS interoperability
> > between systems.
> 
> Thanks, I have enough of it after implementing SVR4.2MP UFS and SFS
> support for Linux..

Heh.  I did some work on UFS for SVR4.2MP on SVR4.2MP, and a
did everything for a derivative called NXFS (the magic number
on that one is _my_ birthday).  ;^).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  8: 4: 8 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id 0CECB37B402
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 08:04:05 -0800 (PST)
Received: from pool0389.cvx22-bradley.dialup.earthlink.net ([209.179.199.134] helo=mindspring.com)
	by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16luBU-0003GI-00; Fri, 15 Mar 2002 08:04:01 -0800
Message-ID: <3C921B5F.E19B89CD@mindspring.com>
Date: Fri, 15 Mar 2002 08:03:43 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Christoph Hellwig <hch@caldera.de>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Christoph Hellwig wrote:
> So all support revision 1 filesystems, no compat flag, the incompatible
> 4.4BSD-style dirent and sparse superblocks. Linux 2.2/2.4 support
> large files and compatiblity for the never released (!) btree directory
> support.  The Linux ext3 driver also supports filesystems that need a
> log replay - for other drivers this will already be cleared by a fsck run.

By the way, in case it wasn't implicitly obvious: thanks
for the research.  I was pretty sure that the gating factor
would be either the Mandrake or the FreeBSD EXT2FS features.

I guess the answer (which we already knew) is that he's going
to have to use the most downrev of the three to implement the
EXT2FS support, though I'm still not clear if that's FreeBSD
or one of the Linux versions he's running.

Is it possible to create an EXT2FS with the lowest common
denominator on a modern Linux by specifying the right command
line arguments to the FS creation tool under Linux?  It's
been quite a while since I've done other than look over
Linux kernel code (I used to contribute fixes for things like
memory leaks in the path component lookup failure case, via
a friend of mine with more influence over there, and that
was about 3 years ago; I still read the kernel, but have
stopped reading the userland).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  8:12:19 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id C807137B42A
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 08:11:55 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2FGBpI18746;
	Fri, 15 Mar 2002 17:11:51 +0100
Date: Fri, 15 Mar 2002 17:11:51 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
Message-ID: <20020315171151.A18291@caldera.de>
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> <3C921B5F.E19B89CD@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C921B5F.E19B89CD@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 08:03:43AM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Mar 15, 2002 at 08:03:43AM -0800, Terry Lambert wrote:
> Is it possible to create an EXT2FS with the lowest common
> denominator on a modern Linux by specifying the right command
> line arguments to the FS creation tool under Linux?

mke2fs -O none.  mke2fs(8) is your friend :)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15  8:14:20 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id 5637B37B400
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 08:14:16 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2FGECQ19100;
	Fri, 15 Mar 2002 17:14:12 +0100
Date: Fri, 15 Mar 2002 17:14:12 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: AQUAMAN <yoatl@yahoo.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: filesystems compatibility
Message-ID: <20020315171412.A18753@caldera.de>
References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> <3C921A04.CFCADA9D@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C921A04.CFCADA9D@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 07:57:56AM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Mar 15, 2002 at 07:57:56AM -0800, Terry Lambert wrote:
> 8-).  Common mistake.  They have opposite word order, so the
> version number is different.  They also have different VTOC
> and disklabel order, so they're easy to differentiate anyway.

At least SVR4.2MP and 4.4BSD run on LE and BE hardware, and the
Linux UFS driver supports both endianesses and about 10 different
derivates..

VTOC handling is done by Linux between the block drivers and the
filesystem which has advantages by e.g. sharing SysV VTOC support
for sysvfs, ufs and vxfs and cannot easily accessed by the filesystem
due to layering constraints.

> Heh.  I did some work on UFS for SVR4.2MP on SVR4.2MP, and a
> did everything for a derivative called NXFS (the magic number
> on that one is _my_ birthday).  ;^).

That NetWare-Attributes thingy? *shrug*

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 10:26: 2 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84])
	by hub.freebsd.org (Postfix) with ESMTP id 7C54A37B404
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 10:25:46 -0800 (PST)
Received: from pool0371.cvx22-bradley.dialup.earthlink.net ([209.179.199.116] helo=mindspring.com)
	by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lwOa-0004ld-00; Fri, 15 Mar 2002 10:25:40 -0800
Message-ID: <3C923C91.454D7710@mindspring.com>
Date: Fri, 15 Mar 2002 10:25:21 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Josh MacDonald <jmacd@CS.Berkeley.EDU>
Cc: Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Josh MacDonald wrote:
> Terry,
> 
> I'm not sure what you're talking about with regards to DOW and
> ReiserFS.  It doesn't sound right, and I'm pretty sure we're not using
> anything like the patented DOW technique as you've described it.

As usual, the patent claims are general enough to cover things;
see:

	US: 5666532
	US: 5642501

Here is the USPTO patent number search engine:

	http://164.195.100.11/netahtml/srchnum.htm


> We are developing a transaction facility for many of the reasons
> suggested at by the original post in this thread.

Yes, I understand.


> To summarize:
> 
> - The file system has never made any guarantees.

Yes it has.  If you look at the atime/mtime/ctime update
requirements for the OS, they are pretty blatant.  THey
just aren't enough to be able to blindly use them.


> - You can use fsync() to stabilize a single file and its metadata
> dependencies.

Metadata stabilization should be automatic.  What an fsync
there does is really enforce ordering on metadata writes,
by acting as a barrier.


> - You can use two-phase commit above and beyond that.

Yes, by implementing on top of fsync.


> - If you're not doing the right thing, "then by definition, your
> application can't have it's correctness effected... since it has no
> correctness to lose."

Yes.


> - And, "the OS provides a set of services upon which reliable services
> can be built, if they are correctly engineered."

Yes.


> All of these statements are true.  Your attitude seems to be that this
> is a fine state of affairs, that anyone who writes an application
> should be fully informed of all these "transactional" issues, and that
> anyone who is not fully informed of all these issues is a complete
> moron if they expect to write reliable applications.

No, I merely expect that a person who claims to be a craftsman
should know his tools.


> The problem is that you're asking way to much of the average
> programmer, who doesn't understand transactions and isn't aware of how
> little the operating system actually guarantees in this regard.

It's not a problem with what I'm asking, or a problem
with what the OS guarantees, it's a problem with "average
programmers".

BTW, I would disagree; I don't think that average programmers
are that badly informed.  If they are, then a CS degree is
meaningless.


> The other problem is that fsync() and two-phase-commit can seriously
> limit application performance, unless you use highly sophisticated
> techniques, which again rules out the average programmer.

"Correct, fast, cheap.  Pick two."


> The fact is, it is very difficult to write "reliable services" on top
> of the standard primitives, and it is not good enough to call people
> morons if they don't understand this.

8-).  My gut reaction was to write:

	You're right.  We must also be compassionate, and
	train them how to properly ask ``Would you like
	fries with that?''.

Frankly, I don't think it's possible to child-proof any career
choice to the point that anyone can come in with zero assumptions
or talent, and be productive.

I personally have very little tolerance for people who get into
any career field because of the money, rather than genuine
interest in the field.  It's my considered opinion that these
people will not last out the next downturn, whenever that
happens, and the world will not be a poorer place for it when
they go off chasing the (then) more lucrative rewards in another
field.

Frankly, rewards are something that comes because of the work
you do, not because of where you do it.  It's like searching
for the contact lens that you lost in the alley under the
street-lamp "because the light is better".

The whole dot-bomb thing happened because people wanted to be
rewarded commensuarate with their job titles, rather than to
their actual contributions to society (at large, or in the
small of the company in which they were operating).  If you
think I regret the people with cardboard "will program for
food" signs, think again.  I might as well regret "Winter"
for the effect it has on species survivability of tropical
plants foolish people attempt to grow outdoors, in Ontario,
Canada, or regret the effect that "Afternoon" has on Morning
Glories.


> There is a document describing our transactions design for ReiserFS
> version 4, which is currently under development:
> 
>    http://namesys.com/txn-doc.html

I've read it.  I don't disagree, for that application domain,
which is certainly a subset of all possible application domains
(e.g. I'd never use transactions on a Usenet server).

And just having it there doesn't mean that unclued people
will automatically use it, if it require explicit invocation.


> And somewhat off topic, I have demonstrated that using fsync() and
> rename() as a means for reliable, atomic file updates can seriously
> limit application performance and that having file system transactions
> solves the problem.  My point is that applications will perform
> better, not worse, if the operating system helps construct reliable
> services instead of this do-it-yourself approach.

That's true as well, at least for applications that require
that.  I think your "average programmer too uninformed to
know about building reliability from primitives" will be
using those primitives, though, so long as they are optional.

I also think making them non-optional is an error, in that it
would perpetuate assumptions in that environment which were
not valid to make in all environments.

A scientist, even a computer scientist, needs to learn to
think, and that involves coming at problems from first
principles, among other things.

FWIW: I pointed out that Soft Updates can be generalized to
export a transaction interface, with a trivial amount of work,
precisely because of the performance issues with barriers.
That's the same reason I pointed out that DOW is inferior to
Soft Updates: it introduces a draining barrier that interferes
with concurrency.  Given the patent status of DOW, it's really
amazing to me that anyone would not opt for Soft Updates in
any contest between the two for the ecological niche they fill,
particularly in new work.


> Master's thesis:
> 
>    http://prdownloads.sourceforge.net/xdelta/xdfs.pdf
> 
> and the graph that shows it all:
> 
>    http://www.cs.berkeley.edu/~jmacd/xdfs-vs-rcs.eps

Thanks for these references; I'll download them now, and
read them later today.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 10:39:45 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from ns.caldera.de (ns.caldera.de [212.34.180.1])
	by hub.freebsd.org (Postfix) with ESMTP id 71B0337B404
	for <freebsd-fs@FreeBSD.ORG>; Fri, 15 Mar 2002 10:39:38 -0800 (PST)
Received: (from hch@localhost)
	by ns.caldera.de (8.11.6/8.11.6) id g2FIciS26567;
	Fri, 15 Mar 2002 19:38:44 +0100
Date: Fri, 15 Mar 2002 19:38:44 +0100
From: Christoph Hellwig <hch@caldera.de>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: metadata update durability ordering/soft updates
Message-ID: <20020315193844.A26441@caldera.de>
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3C923C91.454D7710@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 10:25:21AM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote:
> > - The file system has never made any guarantees.
> 
> Yes it has.  If you look at the atime/mtime/ctime update
> requirements for the OS, they are pretty blatant.  THey
> just aren't enough to be able to blindly use them.

These requirements are only there for O_SYNC.

> > - You can use fsync() to stabilize a single file and its metadata
> > dependencies.
> 
> Metadata stabilization should be automatic.  What an fsync
> there does is really enforce ordering on metadata writes,
> by acting as a barrier.

Why do you think there is fdatasync() (and O_DSYNC)?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 12: 3:58 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP id 3C0DF37B402
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 12:03:52 -0800 (PST)
Received: from pool0434.cvx21-bradley.dialup.earthlink.net ([209.179.193.179] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16lxvR-000170-00; Fri, 15 Mar 2002 12:03:42 -0800
Message-ID: <3C925387.2DC4F2C0@mindspring.com>
Date: Fri, 15 Mar 2002 12:03:19 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Christoph Hellwig <hch@caldera.de>
Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <20020315193844.A26441@caldera.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Christoph Hellwig wrote:
> On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote:
> > > - The file system has never made any guarantees.
> >
> > Yes it has.  If you look at the atime/mtime/ctime update
> > requirements for the OS, they are pretty blatant.  THey
> > just aren't enough to be able to blindly use them.
> 
> These requirements are only there for O_SYNC.

POSIX 1003.1, clauses 2.3.5 and 5.6.6.2 distinguish between
"SHALL be marked for update" and "SHALL be updated" with
regard to the ctime, mtime, and atime values for a file,
which are FS metadata.  See also 5.5.3.2.  The relevent
phrases are:

	2.3.5 [ ... ] All fields that are marked for update
        SHALL be updated when the file is no longer open by
	any process, or when a stat() or fstat() is performed
	on the file.  Other times at which updates are done
	are unspecified.

	5.6.6.2	[ ... ] The utime() function sets the access
	and modification times of the named file.

	5.5.3.2 [ ... ]	Upon successful completion, the
	rename() function SHALL mark for update the st_ctime
	and st_mtime fields of the parent directory of each
	file.

The getdirentries update semantics (SHALL update) and the metadata
modifications (SHALL update) are pretty unambiguous, as well.

The Single UNIX Specification has similar controls on the marking
for update in write, mmap, and other cases.  The POSIX requirements
are stiffer because of VMS, where directories were not implemented
as files.  I used to dislike it, but way back then, I was just
starting out as a student, and didn't realize the transactional
implications.  The single UNIX specification also fails to specify
things like the underlying system call(s) used to implement
directory traversal.  POSIX, however specifies that the atime
"SHALL be updated" (as opposed to merely marked for update). We got
around this requirement one project I was on by not using the
behaviour specified system call interface to read the directory
contents, and declaring that directories were not regular files
for the FS in question.


> > > - You can use fsync() to stabilize a single file and its metadata
> > > dependencies.
> >
> > Metadata stabilization should be automatic.  What an fsync
> > there does is really enforce ordering on metadata writes,
> > by acting as a barrier.
> 
> Why do you think there is fdatasync() (and O_DSYNC)?

Linux?  It used to be called "O_WRITESYNC" back in the mid
1980's.  The idea that an FS would not order your metadata
for you, yet you would still have integrity requirements in
such an environment, was simply unthinkable.

The O_DSYNC came about because people invented the concept
of unsynchronized metadata, which led to the ide that it
should be possible to seperately cause data and metadata
synchronization.

IMO, there's really no excuse for unsynchornized metadata,
and synchronous data writes exist only to avoid the system
call overhead of seperately calling fsync(), and the OS
overhead of having to synchronize all dirty pages instead
of a region, based on the descriptor being used for the
operation.

You can make the same argument in FreeBSD actually: msync()
doesn't limit itself to the range specified for the backing
object, because it can't tell (there are no reverse maps);
last time I looked at msync() in Linux and Solaris, it was
true those places, too.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 12: 8:15 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.fidnet.com (one.fidnet.com [216.229.64.71])
	by hub.freebsd.org (Postfix) with SMTP id A945C37B400
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 12:08:12 -0800 (PST)
Received: (qmail 29310 invoked from network); 15 Mar 2002 20:08:05 -0000
Received: from beast.hexaneinc.com (HELO beast) (216.229.82.132)
  by one.fidnet.com with SMTP; 15 Mar 2002 20:08:04 -0000
From: "Matthew Rezny" <mrezny@umr.edu>
To: "Paul Schenkeveld" <fb-fs@psconsult.nl>
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Date: Fri, 15 Mar 2002 14:09:06 -0600
Reply-To: "Matthew Rezny" <mrezny@umr.edu>
X-Mailer: PMMail 2000 Professional (2.10.2010) For Windows 2000 (5.0.2195;2)
In-Reply-To: <20020315104815.A79816@psconsult.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Subject: Re: disks > 1TB
Message-Id: <20020315200812.A945C37B400@hub.freebsd.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

I haven't tried that. I think I need to clarify my points.

1) disklabel has a limit on how large a volume it can handle because it
uses a signed variable to temporarily store a value that ultimately
goes into an unsigned storage location. I see this as something that
should be fixed in the source tree since its a change to one line of
code to remove an annoyance that more people will soon run into given
the increases in cheap storage.

2) The rest of the OS has similar problems using signed where unneeded
which limits addressable disk space to 1TB. This is something that will
begin to be a problem and need to be worked on. The quick fix is switch
all to unsigned and raise the limit to 2TB. The long term solution is
change it all to 64bit, but that changes the size of everything stored
and so that change would take a lot more work to ensure that doesn't
cause problems with alignment and storage.

On Fri, 15 Mar 2002 10:48:16 +0100, Paul Schenkeveld wrote:

>On Fri, Mar 15, 2002 at 02:35:31AM -0600, Matthew Rezny wrote:
>> I just bought a 3ware 7810 controller and 8 160GB drives, which in
>> RAID5 yields 1.04TB (real TB). Having previously seen statements that
>> FFS limit is 64TB, I expected this to work. Unfortunately I found that
>> the number of sectors becomes an issue. Looking through the mailing
>> list history I see this has come up before and it will take a lot to
>> solve, more than the spare time I have this weekend. The quick solution
>> is make a 1TB filesystem and let the extra .04TB go to waste rather
>> than try to patch the whole system. However, there is a slight problem
>> with this, which is limits in the disklabel tool. The disklabel
>> structure which is stored on disk uses u_int32_t for the number of
>> sectors in the device. The disklabel tool uses int when interpretting
>> all numbers in the getasciilabel() function. This limits disklabel to
>> 1TB devices. If the declaration on line 964 of disklabel.c is changed
>> from "int v" to "u_int32_t v" then this limit is lifted. This change is
>> safe since the actual value on disk is unsigned. Using unsigned in the
>> input allow disklabel to work with devices up to 2TB. This allows
>> creation of 1TB slices on devices >1TB so that at least part can be
>> used in the meantime while we wait for the limit to be lifted elsewhere
>> in the system.
>
>Did you try to divide the disk in two FreeBSD slices using fdisk?
>The numbers in disklabel are relative to the fdisk slice so your
>xx0s1c partition is the same size as the fdisk slice.
>
>> Also, I've seen one mention of 4TB systems in the mailing list
>> archives. How was this done? Kernel patches, other trickery?
>
>-- 
>Paul Schenkeveld, Consultant
>PSconsult ICT Services BV



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 12:56:38 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from roc-24-169-102-121.rochester.rr.com (216-42-72-146.ppp.netsville.net [216.42.72.146])
	by hub.freebsd.org (Postfix) with ESMTP id CC20437B41B
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 12:56:32 -0800 (PST)
Received: from localhost ([127.0.0.1] helo=tiny)
	by roc-24-169-102-121.rochester.rr.com with esmtp (Exim 3.16 #4)
	id 16lyUC-0001tB-00; Fri, 15 Mar 2002 15:39:36 -0500
Date: Fri, 15 Mar 2002 15:39:36 -0500
From: Chris Mason <mason@suse.com>
To: Terry Lambert <tlambert2@mindspring.com>,
	Josh MacDonald <jmacd@CS.Berkeley.EDU>
Cc: Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates
Message-ID: <1562810000.1016224776@tiny>
In-Reply-To: <3C923C91.454D7710@mindspring.com>
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com>
X-Mailer: Mulberry/2.1.0 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Friday, March 15, 2002 10:25:21 AM -0800 Terry Lambert <tlambert2@mindspring.com> wrote:

> Josh MacDonald wrote:
>> Terry,
>> 
>> I'm not sure what you're talking about with regards to DOW and
>> ReiserFS.  It doesn't sound right, and I'm pretty sure we're not using
>> anything like the patented DOW technique as you've described it.
> 
> As usual, the patent claims are general enough to cover things;
> see:
> 
> 	US: 5666532
> 	US: 5642501
> 
> Here is the USPTO patent number search engine:
> 
> 	http://164.195.100.11/netahtml/srchnum.htm
> 

I haven't read the entire patent, but maybe you can point me to the
paragraphs where it covers write-ahead logging in the description.

Durning any operation, no attempt at all is made to order the writing
of the bitmap, the inode, the directory entries, or any other part of
the metadata.  It simply makes sure that after a crash the operations
are either completed or not.  If you mkdir foo and then mkdir foo2,
it is entirely possible the blocks for foo2 go to disk first.

The reiserfs log is also not a generic system module loosely coupled
from with the rest of the filesystem.

-chris


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 16: 9:51 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120])
	by hub.freebsd.org (Postfix) with ESMTP id 2587837B400
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 16:09:48 -0800 (PST)
Received: from pool0278.cvx21-bradley.dialup.earthlink.net ([209.179.193.23] helo=mindspring.com)
	by albatross.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16m1lO-0005n4-00; Fri, 15 Mar 2002 16:09:34 -0800
Message-ID: <3C928D21.404EA11D@mindspring.com>
Date: Fri, 15 Mar 2002 16:09:05 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Mason <mason@suse.com>
Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Chris Mason wrote:
> I haven't read the entire patent, but maybe you can point me to the
> paragraphs where it covers write-ahead logging in the description.

	A subset of writes to secondary storage are performed
	using a Delayed Ordered Write (DOW) subsystem, which
	makes it possible for any file system to control the
	order in which modifications are propagated to disk.
	The DOW subsystem consists of two parts. The first
	part is a specification interface, which a file system
	implementation or any other kernel subsystem can use
	to indicate sequential ordering between a modification
	and some other modification of file system structural
	data.

This is the write-ahead log.  The only difference is where it's
stored: in memory or on disk.

	The second part of DOW subsystem is a mechanism that
	ensures that the disk write operations are indeed
	performed in accordance with the order store. DOW
	improves computer system performance by reducing disk
	traffic as well as the number of context switches
	that would be generated if synchronous writes were
	used for ordering. 

See also claims 1, 6, 23, and 44.


> Durning any operation, no attempt at all is made to order the writing
> of the bitmap, the inode, the directory entries, or any other part of
> the metadata.  It simply makes sure that after a crash the operations
> are either completed or not.  If you mkdir foo and then mkdir foo2,
> it is entirely possible the blocks for foo2 go to disk first.

I didn't say it infringed Soft Updates (which does this), I
said it infringed DOW (which doesn't).

Soft Updates aren't infringible, in any case, since they are
not patented.


> The reiserfs log is also not a generic system module loosely coupled
> from with the rest of the filesystem.

The patent claims are generic enough that they could cover
either case.  Software patents are process patents, not
performance patents.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 16:17: 0 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120])
	by hub.freebsd.org (Postfix) with ESMTP id E60F937B41F
	for <freebsd-fs@freebsd.org>; Fri, 15 Mar 2002 16:16:41 -0800 (PST)
Received: from pool0278.cvx21-bradley.dialup.earthlink.net ([209.179.193.23] helo=mindspring.com)
	by albatross.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16m1sB-0006rf-00; Fri, 15 Mar 2002 16:16:36 -0800
Message-ID: <3C928EC6.14363297@mindspring.com>
Date: Fri, 15 Mar 2002 16:16:06 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Mason <mason@suse.com>,
	Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Terry Lambert wrote:
> This is the write-ahead log.  The only difference is where it's
> stored: in memory or on disk.
[ ... ]
> See also claims 1, 6, 23, and 44.


If this is still confusing, consider whether or not you
would have to cite this patent if you were filing a
patent for what ReiserFS does (I think the answer is "yes").

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Mar 15 18:26:12 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0000137B430; Fri, 15 Mar 2002 18:25:50 -0800 (PST)
Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11])
	by magic.adaptec.com (8.10.2+Sun/8.10.2) with ESMTP id g2G2Poj27257;
	Fri, 15 Mar 2002 18:25:50 -0800 (PST)
Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [162.62.64.10])
	by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id SAA22321;
	Fri, 15 Mar 2002 18:25:49 -0800 (PST)
Received: from hollin.btc.adaptec.com (hollin [162.62.149.56])
	by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id TAA17335;
	Fri, 15 Mar 2002 19:25:47 -0700 (MST)
Received: (from scottl@localhost)
	by hollin.btc.adaptec.com (8.11.6/8.11.6) id g2G2NZM00263;
	Fri, 15 Mar 2002 19:23:35 -0700 (MST)
	(envelope-from scottl)
Date: Fri, 15 Mar 2002 19:02:26 -0700
From: Scott Long <scott_long@btc.adaptec.com>
To: Chris Dillon <cdillon@wolves.k12.mo.us>
Cc: freebsd-scsi@freebsd.org, freebsd-fs@freebsd.org
Subject: Re: CD-MRW a.k.a Mt. Rainier support
Message-ID: <20020316020226.GA12097@bunsenhoneydew.btc.adaptec.com>
References: <Pine.BSF.4.32.0203131116220.31162-100000@mail.wolves.k12.mo.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.BSF.4.32.0203131116220.31162-100000@mail.wolves.k12.mo.us>
User-Agent: Mutt/1.3.28i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Wed, Mar 13, 2002 at 11:59:06AM -0600, Chris Dillon wrote:
> 
> CC'd to freebsd-fs since this is somewhat fs-related...
> 
> Is anyone working on implementing support for CD-MRW (apparently
> included in MMC-3) into either the SCSI cd driver or the ATAPI cd
> driver?  Where/how would be the best place to implement this so that
> it will work with either ATAPI or SCSI drives?  Would implementing it
> in the SCSI cd driver be best, since we now have the option of using
> ATAPI drives with CAM?
> 
> In case anyone is wondering what CD-MRW (Mt. Rainier Re-Writable) is,
> it is a new standard (currently only available in the Yamaha CRW3200
> series, that I know of), that allows on-the-fly transparent
> formatting, hardware defect management, and 2K-block logical
> addressing of CD-RW discs and specifies a specialized UDF filesystem
> to be used along with these hardware abilities.  This will make drives
> supporting this standard act like a more traditional magnetic-media
> removable drive, thus greatly simplifying reading/writing to CD-RW
> discs.  Since MRW uses a new format it is not backwards compatible
> with any existing CD-RW formats, though it is possible to _read_ a MRW
> formatted disc in a regular drive with the proper software support.
> MRW uses UDF as its standard filesystem, which we do not yet support,
> though I envision using the hardware MRW support of the drive to put
> just about anything you want onto it, including FAT or UFS, to use it
> as a "regular" drive.
> 
> I'd love to take a shot at implementing this if someone isn't already,
> though I'll need to find the specs for the hardware side of Mt.
> Rainier.  Apprently it is implemented in the new MMC-3 command set.
> Anyone have any pointers?

This drive sounds very interesting.  Unfortunaley, until the standard
becomes ubiquitous, any UDF implementation will still need to understand
read-modify-write and sparing tables.  UDF is the natural format for it
since with removable media you want inter-changability with other
systems, but there should be nothing stopping you from putting UFS on
it too.
I've already started a UDF implementation for FreeBSD.  Patches for
5.0-CURRENT can be found at http://people.freebsd.org/~scottl, along
with a link for slightly older -STABLE patches.  The current status
is that CD-RWs and DVD-ROMs can be read (though Sparing Tables are
still missing for CD-RW), and once I've cleaned up and filled in the
code some more, I intend for it to go into 5.0-RELEASE.  I'd welcome
any help on the project, escpecially if someone wants to tackle the
writing support.

Scott

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Mar 16  9:17:56 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from roc-24-169-102-121.rochester.rr.com (216-42-72-146.ppp.netsville.net [216.42.72.146])
	by hub.freebsd.org (Postfix) with ESMTP id 9C76637B402
	for <freebsd-fs@freebsd.org>; Sat, 16 Mar 2002 09:17:47 -0800 (PST)
Received: from localhost ([127.0.0.1] helo=tiny)
	by roc-24-169-102-121.rochester.rr.com with esmtp (Exim 3.16 #4)
	id 16mHn8-0003VC-00; Sat, 16 Mar 2002 12:16:26 -0500
Date: Sat, 16 Mar 2002 12:16:26 -0500
From: Chris Mason <mason@suse.com>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates
Message-ID: <1714680000.1016298986@tiny>
In-Reply-To: <3C928D21.404EA11D@mindspring.com>
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com>
X-Mailer: Mulberry/2.1.0 (Linux/x86)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Friday, March 15, 2002 04:09:05 PM -0800 Terry Lambert <tlambert2@mindspring.com> wrote:

> Chris Mason wrote:
>> I haven't read the entire patent, but maybe you can point me to the
>> paragraphs where it covers write-ahead logging in the description.
> 
> 	A subset of writes to secondary storage are performed
> 	using a Delayed Ordered Write (DOW) subsystem, which
> 	makes it possible for any file system to control the
> 	order in which modifications are propagated to disk.
> 	The DOW subsystem consists of two parts. The first
> 	part is a specification interface, which a file system
> 	implementation or any other kernel subsystem can use
> 	to indicate sequential ordering between a modification
> 	and some other modification of file system structural
> 	data.
> 
> This is the write-ahead log.  The only difference is where it's
> stored: in memory or on disk.

Well, I'm certainly not a patent lawyer, but the way they 
define this interface seems very different from the way
reiserfs works.  The interface a) is not available to the
kernel or any other subsystem and b) does not define ordering.
It defines atomic units consisting of multiple operations.

> 	The second part of DOW subsystem is a mechanism that
> 	ensures that the disk write operations are indeed
> 	performed in accordance with the order store. DOW
> 	improves computer system performance by reducing disk
> 	traffic as well as the number of context switches
> 	that would be generated if synchronous writes were
> 	used for ordering. 
> 
> See also claims 1, 6, 23, and 44.

Claim 44 is probably the most difficult, although I think this:

"where said common writes and said function calls have common order 
dependencies CD1, CD2, . . . , CDcd that preserve the update order 
dependencies D1, D2, . . . , Dd between the operations in the requests, 
where cd is an integer, "

Restricts it to systems that preserve the ordering of the requests
inside the combined common write.  In other words, if I batch
mkdir foo ; mkdir foo2 into a common write, I think it says that
mkdir foo will be done first.

> 
> 
>> Durning any operation, no attempt at all is made to order the writing
>> of the bitmap, the inode, the directory entries, or any other part of
>> the metadata.  It simply makes sure that after a crash the operations
>> are either completed or not.  If you mkdir foo and then mkdir foo2,
>> it is entirely possible the blocks for foo2 go to disk first.
> 
> I didn't say it infringed Soft Updates (which does this), I
> said it infringed DOW (which doesn't).

I got that idea from this paragraph:

"DOW includes two parts. The first part is an interface by which file 
system implementations, or any kernel subsystem, specify the sequences 
in which modifications of file system data blocks can be recorded on 
disks. These sequences translate into ordering dependencies among disk 
blocks themselves, which are collectively represented by an ordering 
graph (entries in an ordering store), prepared by DOW in response to 
the specification."

If this has been discussed in detail already, please drop me a link
to the mailing list archive.

-chris
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Mar 16 13:41:40 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP id 6E4E537B43B
	for <freebsd-fs@freebsd.org>; Sat, 16 Mar 2002 13:41:36 -0800 (PST)
Received: from dialup-209.245.143.72.dial1.sanjose1.level3.net ([209.245.143.72] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16mLvY-0007DT-00; Sat, 16 Mar 2002 13:41:25 -0800
Message-ID: <3C93BBF1.7E8801DF@mindspring.com>
Date: Sat, 16 Mar 2002 13:41:05 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Mason <mason@suse.com>
Cc: Josh MacDonald <jmacd@CS.Berkeley.EDU>,
	Parity Error <bootup@mail.ru>, freebsd-fs@FreeBSD.ORG,
	reiserfs-dev@namesys.com
Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates
References: <E16lReK-000C3T-00@f10.mail.ru> <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com> <1714680000.1016298986@tiny>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Chris Mason wrote:
> Claim 44 is probably the most difficult, although I think this:
> 
> "where said common writes and said function calls have common order
> dependencies CD1, CD2, . . . , CDcd that preserve the update order
> dependencies D1, D2, . . . , Dd between the operations in the requests,
> where cd is an integer, "
> 
> Restricts it to systems that preserve the ordering of the requests
> inside the combined common write.  In other words, if I batch
> mkdir foo ; mkdir foo2 into a common write, I think it says that
> mkdir foo will be done first.

I can tell you from my experience with the source code that
this is not true, unless both updates occur in the same
directory entry block of the same directory.


> If this has been discussed in detail already, please drop me a link
> to the mailing list archive.

It has come up on a number of mailing lists in the past;
the FreeBSD mailing lists generally get a snapshot of it
whenever anyone suggests porting ReiserFS to FreeBSD.

Do a search for "ReiserFS" in the FreeBSD mailing list
archives, and you should be able to find it.

Personally, I'd prefer not to discuss it in the level of
detail required for a legal defense against patent claims,
since I believe that ReiserFS would lose, and I'd rather
not be the person manufacturing the bullets for the gun
that shoots it.

Realize that Novell holds the patents that were executed
(such that they could then be assigned) during the time
that USL was owned by Novell.  So SCO buying USL and
Caldera buying SCO doesn't give those patents a "get out
of jail free" card.  8-(.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message