From owner-freebsd-fs  Mon Feb  5  7:26:56 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 382F537B65D; Mon,  5 Feb 2001 07:26:31 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 1CC6D57610; Mon,  5 Feb 2001 09:26:59 -0600 (CST)
Date: Mon, 5 Feb 2001 09:26:59 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: hackers@freebsd.org
Cc: fs@freebsd.org
Subject: Extremely large (70TB) File system/server planning
Message-ID: <20010205092658.A97400@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hello Everyone,

While talking to a friend about what his company is planning to do,
I found out that he is planning a 70TB filesystem/servers/cluster/db.
(Yes, seventy t-e-r-a-b-y-t-e...)

Apparently, he has files that go up to 2gb each, and actually require
such a horribly sized cluster.

If he wanted a PC cluster, and having 5TB on each PC, he would have
350 machines to maintain.  From past experience maintaining clusters,
I guarantee that he will have at least 1 box failing every other day.
And I really do not think his idea of using NFS is that good. ;-)

Now if we were to go to the high-end route (and probably more cost
effective), we can pick SAN's, large Sun fileservers, or somesuch.
I still cannot picture him being able to maintain file integrity.

I say that he should attempt to split his filesystems into much
smaller chunks, say 1TB each.  And attempt some way of having a RAID5
array.  Mirroring or other RAID configurations would prove too costly.
What would you guys do in this case? :)
-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  7:39:30 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mercury.ccmr.cornell.edu (mercury.ccmr.cornell.edu [128.84.231.97])
	by hub.freebsd.org (Postfix) with ESMTP
	id A33F337B491; Mon,  5 Feb 2001 07:39:08 -0800 (PST)
Received: from ruby.ccmr.cornell.edu (IDENT:0@ruby.ccmr.cornell.edu [128.84.231.115])
	by mercury.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id KAA13217;
	Mon, 5 Feb 2001 10:39:04 -0500
Received: from localhost (mitch@localhost)
	by ruby.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id KAA06449;
	Mon, 5 Feb 2001 10:39:02 -0500
X-Authentication-Warning: ruby.ccmr.cornell.edu: mitch owned process doing -bs
Date: Mon, 5 Feb 2001 10:39:02 -0500 (EST)
From: Mitch Collinsworth <mitch@ccmr.cornell.edu>
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
In-Reply-To: <20010205092658.A97400@peorth.iteration.net>
Message-ID: <Pine.LNX.4.10.10102051036410.22516-100000@ruby.ccmr.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

You didn't say what applications this thing is going to support.
That does matter.  A lot.  One thing worth looking at is AFS,
or maybe MR-AFS.  And now OpenAFS.

-Mitch


On Mon, 5 Feb 2001, Michael C . Wu wrote:

> Hello Everyone,
> 
> While talking to a friend about what his company is planning to do,
> I found out that he is planning a 70TB filesystem/servers/cluster/db.
> (Yes, seventy t-e-r-a-b-y-t-e...)
> 
> Apparently, he has files that go up to 2gb each, and actually require
> such a horribly sized cluster.
> 
> If he wanted a PC cluster, and having 5TB on each PC, he would have
> 350 machines to maintain.  From past experience maintaining clusters,
> I guarantee that he will have at least 1 box failing every other day.
> And I really do not think his idea of using NFS is that good. ;-)
> 
> Now if we were to go to the high-end route (and probably more cost
> effective), we can pick SAN's, large Sun fileservers, or somesuch.
> I still cannot picture him being able to maintain file integrity.
> 
> I say that he should attempt to split his filesystems into much
> smaller chunks, say 1TB each.  And attempt some way of having a RAID5
> array.  Mirroring or other RAID configurations would prove too costly.
> What would you guys do in this case? :)
> -- 
> +------------------------------------------------------------------+
> | keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
> | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
> +------------------------------------------------------------------+
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  7:52:59 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from msp-65-25-230-128.mn.rr.com (msp-65-25-230-128.mn.rr.com [65.25.230.128])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1762137B4EC; Mon,  5 Feb 2001 07:52:33 -0800 (PST)
Received: (from z3rk@localhost)
	by msp-65-25-230-128.mn.rr.com (8.11.0/8.11.0) id f15FqUp23714;
	Mon, 5 Feb 2001 09:52:30 -0600
Date: Mon, 5 Feb 2001 09:52:30 -0600
From: Goblin <ahkbarr@yahoo.com>
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: hackers@freebsd.org, fs@freebsd.org
Subject: Re: Extremely large (70TB) File system/server planning
Message-ID: <20010205095229.A30253@msp-65-25-230-128.mn.rr.com>
References: <20010205092658.A97400@peorth.iteration.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <20010205092658.A97400@peorth.iteration.net>; from keichii@iteration.net on Mon, Feb 05, 2001 at 09:26:59AM -0600
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

NetApp filers? And what exactly is too costly? He's got enormous costs
just in doing backups of this thing, and the savings in using NetApp
filers for doing "snapshots" instead of standard backups will buy you
some disk in the end...

What is this data used for? Archival? How oft is it accessed? How much
of the data is "live"? Has he looked at something other than plain disk?

Broaden his horizens and get specifics of his needs.

On 02/05, Michael C . Wu rearranged the electrons to read:
> Hello Everyone,
> 
> While talking to a friend about what his company is planning to do,
> I found out that he is planning a 70TB filesystem/servers/cluster/db.
> (Yes, seventy t-e-r-a-b-y-t-e...)
> 
> Apparently, he has files that go up to 2gb each, and actually require
> such a horribly sized cluster.
> 
> If he wanted a PC cluster, and having 5TB on each PC, he would have
> 350 machines to maintain.  From past experience maintaining clusters,
> I guarantee that he will have at least 1 box failing every other day.
> And I really do not think his idea of using NFS is that good. ;-)
> 
> Now if we were to go to the high-end route (and probably more cost
> effective), we can pick SAN's, large Sun fileservers, or somesuch.
> I still cannot picture him being able to maintain file integrity.
> 
> I say that he should attempt to split his filesystems into much
> smaller chunks, say 1TB each.  And attempt some way of having a RAID5
> array.  Mirroring or other RAID configurations would prove too costly.
> What would you guys do in this case? :)
> -- 
> +------------------------------------------------------------------+
> | keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
> | http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
> +------------------------------------------------------------------+
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message
 Your eyes are weary from staring at the CRT.  You feel sleepy.  Notice how
 restful it is to watch the cursor blink.  Close your eyes.  The opinions
 stated above are yours.  You cannot imagine why you ever felt otherwise.



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  8: 0:10 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id D624237B503; Mon,  5 Feb 2001 07:59:47 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 7D37957610; Mon,  5 Feb 2001 10:00:16 -0600 (CST)
Date: Mon, 5 Feb 2001 10:00:16 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Mitch Collinsworth <mitch@ccmr.cornell.edu>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
Message-ID: <20010205100016.C97400@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <20010205092658.A97400@peorth.iteration.net> <Pine.LNX.4.10.10102051036410.22516-100000@ruby.ccmr.cornell.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.LNX.4.10.10102051036410.22516-100000@ruby.ccmr.cornell.edu>; from mitch@ccmr.cornell.edu on Mon, Feb 05, 2001 at 10:39:02AM -0500
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, Feb 05, 2001 at 10:39:02AM -0500, Mitch Collinsworth scribbled:
| You didn't say what applications this thing is going to support.
| That does matter.  A lot.  One thing worth looking at is AFS,
| or maybe MR-AFS.  And now OpenAFS.

He has database(s) of graphics simulation results. i.e. large files that
are largely unrelated to each other.  Compression is not an option.

The files are accessed approximately 3 or 4 times a day on average.
Older files are archived for reference purpose and may never
be accessed after a week.
-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  8:48:21 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mercury.ccmr.cornell.edu (mercury.ccmr.cornell.edu [128.84.231.97])
	by hub.freebsd.org (Postfix) with ESMTP
	id EF5FA37B69F; Mon,  5 Feb 2001 08:48:00 -0800 (PST)
Received: from ruby.ccmr.cornell.edu (IDENT:0@ruby.ccmr.cornell.edu [128.84.231.115])
	by mercury.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id LAA15223;
	Mon, 5 Feb 2001 11:48:00 -0500
Received: from localhost (mitch@localhost)
	by ruby.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id LAA06750;
	Mon, 5 Feb 2001 11:47:58 -0500
X-Authentication-Warning: ruby.ccmr.cornell.edu: mitch owned process doing -bs
Date: Mon, 5 Feb 2001 11:47:58 -0500 (EST)
From: Mitch Collinsworth <mitch@ccmr.cornell.edu>
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
In-Reply-To: <20010205100016.C97400@peorth.iteration.net>
Message-ID: <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Mon, 5 Feb 2001, Michael C . Wu wrote:

> On Mon, Feb 05, 2001 at 10:39:02AM -0500, Mitch Collinsworth scribbled:
> | You didn't say what applications this thing is going to support.
> | That does matter.  A lot.  One thing worth looking at is AFS,
> | or maybe MR-AFS.  And now OpenAFS.
> 
> He has database(s) of graphics simulation results. i.e. large files that
> are largely unrelated to each other.  Compression is not an option.
> 
> The files are accessed approximately 3 or 4 times a day on average.
> Older files are archived for reference purpose and may never
> be accessed after a week.

Ok, this is a start.  Now is the 70 TB the size of the active files?
Or does that also include the older archived files that may never be
accessed again?

-Mitch



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  9:24:10 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4E85B37B65D; Mon,  5 Feb 2001 09:23:51 -0800 (PST)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 3949A57611; Mon,  5 Feb 2001 11:24:20 -0600 (CST)
Date: Mon, 5 Feb 2001 11:24:20 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Mitch Collinsworth <mitch@ccmr.cornell.edu>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
Message-ID: <20010205112420.A98288@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <20010205100016.C97400@peorth.iteration.net> <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu>; from mitch@ccmr.cornell.edu on Mon, Feb 05, 2001 at 11:47:58AM -0500
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, Feb 05, 2001 at 11:47:58AM -0500, Mitch Collinsworth scribbled:
| On Mon, 5 Feb 2001, Michael C . Wu wrote:
| > On Mon, Feb 05, 2001 at 10:39:02AM -0500, Mitch Collinsworth scribbled:
| > | You didn't say what applications this thing is going to support.
| > | That does matter.  A lot.  One thing worth looking at is AFS,
| > | or maybe MR-AFS.  And now OpenAFS.
| > 
| > He has database(s) of graphics simulation results. i.e. large files that
| > are largely unrelated to each other.  Compression is not an option.
| > 
| > The files are accessed approximately 3 or 4 times a day on average.
| > Older files are archived for reference purpose and may never
| > be accessed after a week.
| 
| Ok, this is a start.  Now is the 70 TB the size of the active files?
| Or does that also include the older archived files that may never be
| accessed again?
70TB is the size of the sum of all files, access or no access.
(They still want to maintain accessibility even though the chances are slim.)
-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  9:51: 1 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8410537B67D; Mon,  5 Feb 2001 09:50:42 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id f15HoZ021657;
	Mon, 5 Feb 2001 09:50:35 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 5 Feb 2001 09:50:35 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200102051750.f15HoZ021657@earth.backplane.com>
To: "Michael C . Wu" <keichii@iteration.net>
Cc: Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
References: <20010205100016.C97400@peorth.iteration.net> <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu> <20010205112420.A98288@peorth.iteration.net>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


:| > The files are accessed approximately 3 or 4 times a day on average.
:| > Older files are archived for reference purpose and may never
:| > be accessed after a week.
:| 
:| Ok, this is a start.  Now is the 70 TB the size of the active files?
:| Or does that also include the older archived files that may never be
:| accessed again?
:70TB is the size of the sum of all files, access or no access.
:(They still want to maintain accessibility even though the chances are slim.)
:-- 
:+------------------------------------------------------------------+
:| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
:| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
:+------------------------------------------------------------------+

    This doesn't sound like something you can just throw together with
    off-the-shelf PCs and still have something reliable to show for it.
    You need a big honking RAID system - maybe a NetApp, maybe something
    else.  You have to look at the filesystem and file size limitations
    of the unit and the client(s).

    FreeBSD can only support 1 TB sized filesystems.  Our device layer
    converts everything to DEV_BSIZE'd (512) blocks, so to be safe:
    2^31 x 512 bytes = 1 TB on Intel boxes.  Our NFS implementation has the
    same per-filesystem limitation.  Theoretically UFS/FFS are limited 
    to 2^31 x blocksize, where blocksize can be larger (e.g. 16384 bytes,
    65536 bytes), but I have grave doubts that that actually works.. I'm
    fairly certain that we still convert things to 512 byte block numbers
    at the device level, and we only use a 32 bit int to store the 
    block number.

    So FreeBSD could be used as an NFS client, but probably not a server
    for your application.  Considering the number of disks you need to
    manage, something like a NetApp or other completely self contained
    RAID-5-capable system for handling the disks is mandatory.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5  9:51:49 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mercury.ccmr.cornell.edu (mercury.ccmr.cornell.edu [128.84.231.97])
	by hub.freebsd.org (Postfix) with ESMTP
	id C141B37B684; Mon,  5 Feb 2001 09:51:28 -0800 (PST)
Received: from ruby.ccmr.cornell.edu (IDENT:0@ruby.ccmr.cornell.edu [128.84.231.115])
	by mercury.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA17009;
	Mon, 5 Feb 2001 12:51:28 -0500
Received: from localhost (mitch@localhost)
	by ruby.ccmr.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA06978;
	Mon, 5 Feb 2001 12:51:26 -0500
X-Authentication-Warning: ruby.ccmr.cornell.edu: mitch owned process doing -bs
Date: Mon, 5 Feb 2001 12:51:26 -0500 (EST)
From: Mitch Collinsworth <mitch@ccmr.cornell.edu>
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
In-Reply-To: <20010205112420.A98288@peorth.iteration.net>
Message-ID: <Pine.LNX.4.10.10102051238190.22516-100000@ruby.ccmr.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org



On Mon, 5 Feb 2001, Michael C . Wu wrote:

> On Mon, Feb 05, 2001 at 11:47:58AM -0500, Mitch Collinsworth scribbled:
> | On Mon, 5 Feb 2001, Michael C . Wu wrote:
> | > On Mon, Feb 05, 2001 at 10:39:02AM -0500, Mitch Collinsworth scribbled:
> | > | You didn't say what applications this thing is going to support.
> | > | That does matter.  A lot.  One thing worth looking at is AFS,
> | > | or maybe MR-AFS.  And now OpenAFS.
> | > 
> | > He has database(s) of graphics simulation results. i.e. large files that
> | > are largely unrelated to each other.  Compression is not an option.
> | > 
> | > The files are accessed approximately 3 or 4 times a day on average.
> | > Older files are archived for reference purpose and may never
> | > be accessed after a week.
> | 
> | Ok, this is a start.  Now is the 70 TB the size of the active files?
> | Or does that also include the older archived files that may never be
> | accessed again?
> 70TB is the size of the sum of all files, access or no access.
> (They still want to maintain accessibility even though the chances are slim.)

Ok, well the next question to look at is how do they define "maintain
accessibility".  In other words what do they consider acceptable?
Accessible in 5 seconds, accessible in 1 minute, accessible in 10
minutes, accessible in 1 hour, accessible overnight?

70 TB, as you have already noticed, is no simple feat to accomplish.
No matter how you slice it it's going to cost $$.  Different levels
of accessibility requirement for the archived data can be accomplished
with differing technologies and at differing costs.

You could rough out a plan for keeping the whole thing online and
spinning for instant access and then compare the costs of that with
various options that keep the hot data online and archive the rest
in varying ways that allow for differing speed of access.  Maybe you
can archive old data on CDs or tapes.  Perhaps keep more recent
archives "online" in a jukebox where they are fairly quickly
accessible, while older archives are on a rack where someone has to
retrieve them as needed.

The real question here is: are they really willing to spend what it
would take to keep an archive of this size spinning, including
systems programmers and administrators?  Or are they willing to
spend less and have it take a bit longer to get access to the older
data?

-Mitch



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5 10:21:12 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from meketrex.pix.net (meketrex.pix.net [192.111.45.13])
	by hub.freebsd.org (Postfix) with ESMTP
	id 08AA637B491; Mon,  5 Feb 2001 10:20:49 -0800 (PST)
Received: by meketrex.pix.net 
	id NAA00519; Mon, 5 Feb 2001 13:20:43 -0500 (EST)
Message-ID: <20010205132042.A324@pix.net>
Date: Mon, 5 Feb 2001 13:20:42 -0500
From: "Kurt J. Lidl" <lidl@pix.net>
To: Matt Dillon <dillon@earth.backplane.com>,
	"Michael C . Wu" <keichii@iteration.net>
Cc: Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
References: <20010205100016.C97400@peorth.iteration.net> <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu> <20010205112420.A98288@peorth.iteration.net> <200102051750.f15HoZ021657@earth.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93.2
In-Reply-To: <200102051750.f15HoZ021657@earth.backplane.com>; from Matt Dillon on Mon, Feb 05, 2001 at 09:50:35AM -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Mon, Feb 05, 2001 at 09:50:35AM -0800, Matt Dillon wrote:
> :70TB is the size of the sum of all files, access or no access.
> :(They still want to maintain accessibility even though the chances are slim.)
> 
>     This doesn't sound like something you can just throw together with
>     off-the-shelf PCs and still have something reliable to show for it.
>     You need a big honking RAID system - maybe a NetApp, maybe something
>     else.  You have to look at the filesystem and file size limitations
>     of the unit and the client(s).

NetApp's biggest box can "only" handle 6TB of data, currently, using the
latest and greatest software.  They claim (and I believe them) that
12TB will be the limit later this year.

>     So FreeBSD could be used as an NFS client, but probably not a server
>     for your application.  Considering the number of disks you need to
>     manage, something like a NetApp or other completely self contained
>     RAID-5-capable system for handling the disks is mandatory.

Netapps are actually RAID-4 (dedicated parity disk), not RAID-5 (parity data
is recorded across all drives).

-Kurt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5 10:24:55 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from ncmail.netcentralen.dk (ncmail.netcentralen.dk [195.24.7.103])
	by hub.freebsd.org (Postfix) with ESMTP id 1D78B37B4EC
	for <fs@freebsd.org>; Mon,  5 Feb 2001 10:24:37 -0800 (PST)
Received: from mother.netcentralen.dk (mother.netcentralen.dk [195.24.7.107])
	by ncmail.netcentralen.dk (8.9.3/8.9.3) with ESMTP id TAA47957
	for <fs@freebsd.org>; Mon, 5 Feb 2001 19:31:53 +0100 (CET)
	(envelope-from mar@netcentralen.dk)
Received: by mother.netcentralen.dk with Internet Mail Service (5.5.2650.21)
	id <D3M39H5N>; Mon, 5 Feb 2001 19:30:50 +0100
Message-ID: <9164771DDCABD3118333005004E9446E2B7784@mother.netcentralen.dk>
From: Michael Aronsen <mar@netcentralen.dk>
To: "'fs@freebsd.org'" <fs@freebsd.org>
Subject: SV: Extremely large (70TB) File system/server planning
Date: Mon, 5 Feb 2001 19:30:44 +0100 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

How about an SGI system - XFS is claimed to have no size limit?

//Michael Aronsen

-----Oprindelig meddelelse-----
Fra: Kurt J. Lidl [mailto:lidl@pix.net]
Sendt: 5. februar 2001 19:21
Til: Matt Dillon; Michael C . Wu
Cc: Mitch Collinsworth; hackers@FreeBSD.ORG; fs@FreeBSD.ORG
Emne: Re: Extremely large (70TB) File system/server planning


On Mon, Feb 05, 2001 at 09:50:35AM -0800, Matt Dillon wrote:
> :70TB is the size of the sum of all files, access or no access.
> :(They still want to maintain accessibility even though the chances are
slim.)
> 
>     This doesn't sound like something you can just throw together with
>     off-the-shelf PCs and still have something reliable to show for it.
>     You need a big honking RAID system - maybe a NetApp, maybe something
>     else.  You have to look at the filesystem and file size limitations
>     of the unit and the client(s).

NetApp's biggest box can "only" handle 6TB of data, currently, using the
latest and greatest software.  They claim (and I believe them) that
12TB will be the limit later this year.

>     So FreeBSD could be used as an NFS client, but probably not a server
>     for your application.  Considering the number of disks you need to
>     manage, something like a NetApp or other completely self contained
>     RAID-5-capable system for handling the disks is mandatory.

Netapps are actually RAID-4 (dedicated parity disk), not RAID-5 (parity data
is recorded across all drives).

-Kurt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5 10:29:52 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9053837B401; Mon,  5 Feb 2001 10:29:34 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id f15ITYY22891;
	Mon, 5 Feb 2001 10:29:34 -0800 (PST)
	(envelope-from dillon)
Date: Mon, 5 Feb 2001 10:29:34 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200102051829.f15ITYY22891@earth.backplane.com>
To: "Michael C . Wu" <keichii@iteration.net>,
	Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
References: <20010205100016.C97400@peorth.iteration.net> <Pine.LNX.4.10.10102051146300.22516-100000@ruby.ccmr.cornell.edu> <20010205112420.A98288@peorth.iteration.net> <200102051750.f15HoZ021657@earth.backplane.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

:    2^31 x 512 bytes = 1 TB on Intel boxes.  Our NFS implementation has the
:    same per-filesystem limitation.  Theoretically UFS/FFS are limited 

    Oops.  I meant, per-file limitation for NFS clients, not per-filesystem.
    1TB per file.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5 12:51:38 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mass.dis.org (mass.dis.org [216.240.45.41])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6398537B491; Mon,  5 Feb 2001 12:51:18 -0800 (PST)
Received: from mass.dis.org (localhost [127.0.0.1])
	by mass.dis.org (8.11.1/8.11.1) with ESMTP id f15KqOe00985;
	Mon, 5 Feb 2001 12:52:24 -0800 (PST)
	(envelope-from msmith@mass.dis.org)
Message-Id: <200102052052.f15KqOe00985@mass.dis.org>
X-Mailer: exmh version 2.1.1 10/15/1999
To: Matt Dillon <dillon@earth.backplane.com>
Cc: "Michael C . Wu" <keichii@iteration.net>,
	Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning 
In-reply-to: Your message of "Mon, 05 Feb 2001 09:50:35 PST."
             <200102051750.f15HoZ021657@earth.backplane.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 05 Feb 2001 12:52:24 -0800
From: Mike Smith <msmith@freebsd.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
> :| > The files are accessed approximately 3 or 4 times a day on average.
> :| > Older files are archived for reference purpose and may never
> :| > be accessed after a week.
> :| 
> :| Ok, this is a start.  Now is the 70 TB the size of the active files?
> :| Or does that also include the older archived files that may never be
> :| accessed again?
> :70TB is the size of the sum of all files, access or no access.
> :(They still want to maintain accessibility even though the chances are slim.)
...
>     This doesn't sound like something you can just throw together with
>     off-the-shelf PCs and still have something reliable to show for it.
>     You need a big honking RAID system - maybe a NetApp, maybe something
>     else.  You have to look at the filesystem and file size limitations
>     of the unit and the client(s).

You can't do this with a NetApp either; they max out at about 6TB now 
(going up to around 12 or so soon).  You might want to talk to EMC and/or 
IBM, both of whom have *extremely* large filers.

Your friend may also want to look at Traakan, who have a novel product in 
this space.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Feb  5 12:59: 7 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6EFA937B401; Mon,  5 Feb 2001 12:58:44 -0800 (PST)
Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19)
	id <DVS3DG1B>; Mon, 5 Feb 2001 13:58:24 -0700
Message-ID: <5FE9B713CCCDD311A03400508B8B3013054E3F50@bdr-xcln.is.matchlogic.com>
From: Charles Randall <crandall@matchlogic.com>
To: "Michael C . Wu" <keichii@iteration.net>,
	'Mike Smith' <msmith@freebsd.org>,
	Matt Dillon <dillon@earth.backplane.com>
Cc: Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: RE: Extremely large (70TB) File system/server planning 
Date: Mon, 5 Feb 2001 13:58:22 -0700 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Does this have to be a single filesystem?

If not, just provide a database front-end that maps some kind of resource
identifier to the filesystem name.

With that, you can span filers and/or filesystems. Seems like the only thing
that would be reasonable.

Charles

-----Original Message-----
From: Mike Smith [mailto:msmith@freebsd.org]
Sent: Monday, February 05, 2001 1:52 PM
To: Matt Dillon
Cc: Michael C . Wu; Mitch Collinsworth; hackers@FreeBSD.ORG;
fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning 


> 
> :| > The files are accessed approximately 3 or 4 times a day on average.
> :| > Older files are archived for reference purpose and may never
> :| > be accessed after a week.
> :| 
> :| Ok, this is a start.  Now is the 70 TB the size of the active files?
> :| Or does that also include the older archived files that may never be
> :| accessed again?
> :70TB is the size of the sum of all files, access or no access.
> :(They still want to maintain accessibility even though the chances are
slim.)
...
>     This doesn't sound like something you can just throw together with
>     off-the-shelf PCs and still have something reliable to show for it.
>     You need a big honking RAID system - maybe a NetApp, maybe something
>     else.  You have to look at the filesystem and file size limitations
>     of the unit and the client(s).

You can't do this with a NetApp either; they max out at about 6TB now 
(going up to around 12 or so soon).  You might want to talk to EMC and/or 
IBM, both of whom have *extremely* large filers.

Your friend may also want to look at Traakan, who have a novel product in 
this space.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6  3: 0:30 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5817737B503; Tue,  6 Feb 2001 03:00:08 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id BAE6628E66; Tue,  6 Feb 2001 17:00:03 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id ABBCA28E46; Tue,  6 Feb 2001 17:00:03 +0600 (ALMT)
Date: Tue, 6 Feb 2001 17:00:03 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: freebsd-arch@freebsd.org
Cc: freebsd-fs@freebsd.org
Subject: vnode interlock API
Message-ID: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

	Hello,

	Few months ago simple locks used for vnode interlock were replaced
by mutexes. It causes additional pain for externally maintained
filesystems and lowers portability of the code between -stable and
-current.

	So, I suggest to introduce two macro definitions which will hide
implementation details for interlocks:

#define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
#define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)

	for RELENG_4 they will look like this:

#define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
#define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)

	Any comments, suggestions ?

--
Boris Popov
http://www.butya.kz/~bp/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6  3: 3:47 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147])
	by hub.freebsd.org (Postfix) with ESMTP
	id CEB2037B401; Tue,  6 Feb 2001 03:03:24 -0800 (PST)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.1/8.11.1) with ESMTP id f16B33B33409;
	Tue, 6 Feb 2001 12:03:03 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
Subject: Re: vnode interlock API 
In-Reply-To: Your message of "Tue, 06 Feb 2001 17:00:03 +0600."
             <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz> 
Date: Tue, 06 Feb 2001 12:03:03 +0100
Message-ID: <33407.981457383@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Sounds like something which should have been done long time ago...

In message <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>, Boris Popov writes:
>	Hello,
>
>	Few months ago simple locks used for vnode interlock were replaced
>by mutexes. It causes additional pain for externally maintained
>filesystems and lowers portability of the code between -stable and
>-current.
>
>	So, I suggest to introduce two macro definitions which will hide
>implementation details for interlocks:
>
>#define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
>#define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)
>
>	for RELENG_4 they will look like this:
>
>#define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
>#define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)
>
>	Any comments, suggestions ?
>
>--
>Boris Popov
>http://www.butya.kz/~bp/
>
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-arch" in the body of the message
>

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6  7:29: 4 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id A4C6537B401; Tue,  6 Feb 2001 07:28:44 -0800 (PST)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id HAA27735;
	Tue, 6 Feb 2001 07:18:31 -0800 (PST)
Message-Id: <200102061518.HAA27735@implode.root.com>
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning 
In-reply-to: Your message of "Mon, 05 Feb 2001 09:26:59 CST."
             <20010205092658.A97400@peorth.iteration.net> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Tue, 06 Feb 2001 07:18:31 -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>While talking to a friend about what his company is planning to do,
>I found out that he is planning a 70TB filesystem/servers/cluster/db.
>(Yes, seventy t-e-r-a-b-y-t-e...)

   We could do this using about 44 of the not-yet-announced TSR-3100 fibre
channel RAID storage systems. These are 1.8TB (1.62TB usable) capacity units
in a 3U cabinet. It would take around 200A @ 120VAC (about 18KW) to power all
of them and should fit in about 5 rack cabinets. Total cost would be about
$3 million.

-DG

David Greenman
Co-founder, The FreeBSD Project - http://www.freebsd.org
President, TeraSolutions, Inc. - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6  8:32:23 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8948D37B4EC; Tue,  6 Feb 2001 08:32:03 -0800 (PST)
Received: from fwd07.sul.t-online.com 
	by mailout02.sul.t-online.com with smtp 
	id 14QB27-00052q-00; Tue, 06 Feb 2001 17:31:59 +0100
Received: from frolic.no-support.loc (520094253176-0001@[217.80.111.106]) by fmrl07.sul.t-online.com
	with esmtp id 14QB1l-2Kk35mC; Tue, 6 Feb 2001 17:31:37 +0100
Received: (from bjoern@localhost)
	by frolic.no-support.loc (8.11.1/8.9.3) id f16GLp600648;
	Tue, 6 Feb 2001 17:21:51 +0100 (CET)
	(envelope-from bjoern)
From: Bjoern Fischer <bfischer@Techfak.Uni-Bielefeld.DE>
Date: Tue, 6 Feb 2001 17:21:50 +0100
To: Boris Popov <bp@butya.kz>
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
Subject: Re: vnode interlock API
Message-ID: <20010206172150.A528@frolic.no-support.loc>
References: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>; from bp@butya.kz on Tue, Feb 06, 2001 at 05:00:03PM +0600
X-Sender: 520094253176-0001@t-dialin.net
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hello,

> 	Few months ago simple locks used for vnode interlock were replaced
> by mutexes. It causes additional pain for externally maintained
> filesystems and lowers portability of the code between -stable and
> -current.
> 
> 	So, I suggest to introduce two macro definitions which will hide
> implementation details for interlocks:
> 
> #define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
> #define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)

BTW, does this mean that -current vnode locking works sufficiently
enough to support stacked file systems a la Eric Zadok's FiST software?

  Bjoern

-- 
-----BEGIN GEEK CODE BLOCK-----
GCS d--(+) s++: a- C+++(-) UB++++OSI++++$ P+++(-) L---(++) !E W- N+ o>+
K- !w !O !M !V  PS++  PE-  PGP++  t+++  !5 X++ tv- b+++ D++ G e+ h-- y+ 
------END GEEK CODE BLOCK------


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 11:51:55 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88])
	by hub.freebsd.org (Postfix) with ESMTP
	id 192A337B401; Tue,  6 Feb 2001 11:51:34 -0800 (PST)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by meow.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id f16Jo9345186;
	Tue, 6 Feb 2001 11:50:09 -0800 (PST)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.010206115111.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz>
Date: Tue, 06 Feb 2001 11:51:11 -0800 (PST)
From: John Baldwin <jhb@FreeBSD.org>
To: Boris Popov <bp@butya.kz>
Subject: RE: vnode interlock API
Cc: freebsd-fs@FreeBSD.org, freebsd-arch@FreeBSD.org
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On 06-Feb-01 Boris Popov wrote:
>       Hello,
> 
>       Few months ago simple locks used for vnode interlock were replaced
> by mutexes. It causes additional pain for externally maintained
> filesystems and lowers portability of the code between -stable and
> -current.

Sounds good.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 13:16: 9 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18])
	by hub.freebsd.org (Postfix) with ESMTP id A6AAD37B401
	for <freebsd-fs@freebsd.org>; Tue,  6 Feb 2001 13:15:52 -0800 (PST)
Received: from opal (cs.binghamton.edu [128.226.123.101])
	by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f16LFp002770
	for <freebsd-fs@freebsd.org>; Tue, 6 Feb 2001 16:15:51 -0500 (EST)
Date: Tue, 6 Feb 2001 16:15:45 -0500 (EST)
From: Zhiui Zhang <zzhang@cs.binghamton.edu>
X-Sender: zzhang@opal
To: freebsd-fs@freebsd.org
Subject: Design a journalled file system
Message-ID: <Pine.SOL.4.21.0102061544230.6584-100000@opal>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I am considering the design of a journalled file system in FreeBSD. I
think each transaction corresponds to a file system update operation and
will therefore consists of a list of modified buffers.  The important
thing is that these buffers should not be written to disk until they have
been logged into the log area. To do so, we need to pin these buffers in
memory for a while. The concept should be simple, but I run into a problem
which I have no idea how to solve it:

If you access a lot of files quickly, some vnodes will be reused.  These
vnodes can contain buffers that are still pinned in the memory because of
the write-ahead logging constraints.  After a vnode is gone, we have
no way to recover its buffers. Note that whenever we need a new vnode, we
are in the process of creating a new file. At this point, we can not flush
the buffers to the log area.  The result is a deadlock.

I could make copies of the buffers that are still pinned, but that incurs
memory copy and need buffer headers, which is also a rare resource.

The design is similar to ext3fs of linux (they do not seem to have a vnode
layer and they use device + physical block number instead of vnode +
logical block number to index buffers, which, I guess, means that buffers
can exist after the inode is gone). I know Mckusick has a paper on
journalling FFS, but I just want to know if this design can work or not.

Any ideas?  Thanks for your help!

-Zhihui



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 13:47:31 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from gw.errno.com (node-d1d4bd7a.powerinter.net [209.212.189.122])
	by hub.freebsd.org (Postfix) with ESMTP id 944B337B491
	for <freebsd-fs@FreeBSD.ORG>; Tue,  6 Feb 2001 13:47:14 -0800 (PST)
Received: from melange (melange.errno.com [209.212.166.36])
	by gw.errno.com (8.9.0/8.9.0) with SMTP id NAA28653;
	Tue, 6 Feb 2001 13:47:12 -0800 (PST)
Message-ID: <0e9101c09086$5ca812b0$24a6d4d1@melange>
From: "Sam Leffler" <sam@errno.com>
To: "Zhiui Zhang" <zzhang@cs.binghamton.edu>,
	<freebsd-fs@FreeBSD.ORG>
References: <Pine.SOL.4.21.0102061544230.6584-100000@opal>
Subject: Re: Design a journalled file system
Date: Tue, 6 Feb 2001 13:47:11 -0800
Organization: Errno Consulting
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.3018.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.3018.1300
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

If you really want to work on another filesystem, learn about/from SGI's
XFS.  They've made a GPL'd version for Linux version available for public
ftp.

    Sam




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 13:53:36 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 90BF237B401
	for <freebsd-fs@FreeBSD.ORG>; Tue,  6 Feb 2001 13:53:19 -0800 (PST)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f16LrI626721;
	Tue, 6 Feb 2001 13:53:18 -0800 (PST)
Date: Tue, 6 Feb 2001 13:53:18 -0800
From: Alfred Perlstein <bright@wintelcom.net>
To: Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
Message-ID: <20010206135317.Z26076@fw.wintelcom.net>
References: <Pine.SOL.4.21.0102061544230.6584-100000@opal>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.SOL.4.21.0102061544230.6584-100000@opal>; from zzhang@cs.binghamton.edu on Tue, Feb 06, 2001 at 04:15:45PM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

* Zhiui Zhang <zzhang@cs.binghamton.edu> [010206 13:16] wrote:
> 
> I am considering the design of a journalled file system in FreeBSD. I
> think each transaction corresponds to a file system update operation and
> will therefore consists of a list of modified buffers.  The important
> thing is that these buffers should not be written to disk until they have
> been logged into the log area. To do so, we need to pin these buffers in
> memory for a while. The concept should be simple, but I run into a problem
> which I have no idea how to solve it:
> 
> If you access a lot of files quickly, some vnodes will be reused.  These
> vnodes can contain buffers that are still pinned in the memory because of
> the write-ahead logging constraints.  After a vnode is gone, we have
> no way to recover its buffers. Note that whenever we need a new vnode, we
> are in the process of creating a new file. At this point, we can not flush
> the buffers to the log area.  The result is a deadlock.
> 
> I could make copies of the buffers that are still pinned, but that incurs
> memory copy and need buffer headers, which is also a rare resource.
> 
> The design is similar to ext3fs of linux (they do not seem to have a vnode
> layer and they use device + physical block number instead of vnode +
> logical block number to index buffers, which, I guess, means that buffers
> can exist after the inode is gone). I know Mckusick has a paper on
> journalling FFS, but I just want to know if this design can work or not.
> 
> Any ideas?  Thanks for your help!

There's ways to reassign buffers to other vnodes, you can remove
the buffers from the vnodes at reclaim time (there has to be a hook
for this) and link them to a special vnode linked from your mount
structure.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 18:21:16 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18])
	by hub.freebsd.org (Postfix) with ESMTP id 6B12937B401
	for <freebsd-fs@FreeBSD.ORG>; Tue,  6 Feb 2001 18:20:58 -0800 (PST)
Received: from opal (cs.binghamton.edu [128.226.123.101])
	by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f172Kt025746;
	Tue, 6 Feb 2001 21:20:55 -0500 (EST)
Date: Tue, 6 Feb 2001 21:20:50 -0500 (EST)
From: Zhiui Zhang <zzhang@cs.binghamton.edu>
X-Sender: zzhang@opal
To: Alfred Perlstein <bright@wintelcom.net>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
In-Reply-To: <20010206135317.Z26076@fw.wintelcom.net>
Message-ID: <Pine.SOL.4.21.0102062118020.21503-100000@opal>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, 6 Feb 2001, Alfred Perlstein wrote:
> 
> There's ways to reassign buffers to other vnodes, you can remove
> the buffers from the vnodes at reclaim time (there has to be a hook
> for this) and link them to a special vnode linked from your mount
> structure.

Thanks, I guess that I can write a function that steals the pages from the
disappearing buffer and move it over to the new buffer that is going to
replace it. 

-Zhihui



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Feb  6 18:51:38 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP id 5181E37B401
	for <freebsd-fs@FreeBSD.ORG>; Tue,  6 Feb 2001 18:51:21 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id 5ACB129073; Wed,  7 Feb 2001 08:51:19 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id 4BA9F29072; Wed,  7 Feb 2001 08:51:19 +0600 (ALMT)
Date: Wed, 7 Feb 2001 08:51:19 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: Bjoern Fischer <bfischer@Techfak.Uni-Bielefeld.DE>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: vnode interlock API
In-Reply-To: <20010206172150.A528@frolic.no-support.loc>
Message-ID: <Pine.BSF.4.21.0102070847080.4563-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, 6 Feb 2001, Bjoern Fischer wrote:

> > #define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
> > #define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)
> 
> BTW, does this mean that -current vnode locking works sufficiently
> enough to support stacked file systems a la Eric Zadok's FiST software?

	Hmm, I didn't see how this relates to the stacked file systems,
but can say that there is mostly finished generic code to support stacked
file systems. I hope to post it for review in few weeks.

--
Boris Popov
http://www.butya.kz/~bp/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 13:26:41 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2A49137B65D; Wed,  7 Feb 2001 13:26:18 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id OAA27535;
	Wed, 7 Feb 2001 14:23:20 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp03.primenet.com, id smtpdAAA7zaWQ1; Wed Feb  7 14:23:10 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id OAA24284;
	Wed, 7 Feb 2001 14:26:00 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072126.OAA24284@usr08.primenet.com>
Subject: Re: vnode interlock API
To: bp@butya.kz (Boris Popov)
Date: Wed, 7 Feb 2001 21:26:00 +0000 (GMT)
Cc: freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.21.0102061638280.82511-100000@lion.butya.kz> from "Boris Popov" at Feb 06, 2001 05:00:03 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 	So, I suggest to introduce two macro definitions which will hide
> implementation details for interlocks:
> 
> #define VI_LOCK(vp)		mtx_enter(&(vp)->v_interlock, MTX_DEF)
> #define VI_UNLOCK(vp)		mtx_exit(&(vp)->v_interlock, MTX_DEF)
> 
> 	for RELENG_4 they will look like this:
> 
> #define VI_LOCK(vp)		simple_lock(&(vp)->v_interlock)
> #define VI_UNLOCK(vp)		simple_unlock(&(vp)->v_interlock)
> 
> 	Any comments, suggestions ?

1)	Macros are good; interfaces are better.  I've consistantly
	recommended that the NFS cookie interface be rewritten to
	not require cookies, even though the FreeBSD/NetBSD/OpenBSD
	differences _could_ be masked with macros.  The issue is
	one of binary vs. source compatability.

2)	If you are going to wrap vnode handling, it would probably
	be a good idea to wrap it using the same approach that
	another OS uses, instead of being gratuitously different
	in naming.  I would suggest using the Solaris names, but I
	will admit that doing that depends  heavily on the semantics
	being the same (I think they would be).  Worst case, pick an
	OS with the same semantics; if there are none, this may be
	an opportunity to learn from other OSs _why_ they don't have
	the same semantics.

3)	It seems to mee that the additional parameter of MTX_DEF is
	gratuitous, and tries to stretch mutex semantics further
	than they should be stretched.  I personally would have no
	problem with the conversion of simple_{un}lock() into the
	equivalent mtx_*() calls.  Even if the MTX_DEF can not be
	murdered without a large public outcry, using this as the
	the default demantic for the simple_*() equivalents isn't
	really a bad idea, in my book, and could be done with
	inline wrappers.  Best case, one could apply the WITNESS
	code to debugging 4.x problems, with some work.

4)	You need to wrap the calls with "{ ... }"; this is because
	it may be useful in the future to institute turnstile or
	single wakeup semantics, and converting the macro into a
	single statement instead of a statement block would mean
	a potentially large amount of work would be needed to cope
	with the change later, whereas, you seem to plan to already
	need to touch all those spots now.  Again, the Solaris SMP
	vnode lock management macros are, I think, a good example
	(or at least they were, six years ago, when Solaris faced
	the same problem).

I have other comments, but these are the four most important ones,
IMO, and I've been making a conscious effort to not clutter arguments
by giving more detail than people seem to want to hear before they
overflow and tune out.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 13:48:32 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP id DD70437B6AD
	for <freebsd-fs@FreeBSD.ORG>; Wed,  7 Feb 2001 13:48:14 -0800 (PST)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id OAA17450;
	Wed, 7 Feb 2001 14:47:39 -0700
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp10.phx.gblx.net, id smtpdwci6aa; Wed Feb  7 14:47:33 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id OAA25001;
	Wed, 7 Feb 2001 14:48:05 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072148.OAA25001@usr08.primenet.com>
Subject: Re: Design a journalled file system
To: sam@errno.com (Sam Leffler)
Date: Wed, 7 Feb 2001 21:48:05 +0000 (GMT)
Cc: zzhang@cs.binghamton.edu (Zhiui Zhang), freebsd-fs@FreeBSD.ORG
In-Reply-To: <0e9101c09086$5ca812b0$24a6d4d1@melange> from "Sam Leffler" at Feb 06, 2001 01:47:11 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> If you really want to work on another filesystem, learn about/from SGI's
> XFS.  They've made a GPL'd version for Linux version available for public
> ftp.

Unfortunately, this license means that it can not be distributed
compiled into a FreeBSD kernel, since clause 6 of the GPL will
specifically prohibit such distribution.

The upshot of this is that it can never be the default FS used
to boot FreeBSD, out of the box, nor to install by default,
since the module would have to be loaded from an FS which the
system can not understand until after it has loaded the module.

Historically, the soloution that is often suggested for this
second problem is to use a simpler boot FS that the kernel
understands (Xenix, SCO UNIX, and SVR4 have all taken this
approach), but doing this renders the bootfs to be a single
point of failure for boot, and therefore the increased MTBF
that supposedly comes from using an advanced FS does nothing
for the overall MTBF.

In other words, the SGI XFS is an interesting curiousity, and
may or may not be a useful reference implementation for another
work, but it can never be used in a commercially usable OS, for
which source code is inconvenient or impossible to distribute
(even SGI can not take modifications made to repair bugs in
the Linux version, without having to place all of IRIX under
the GPL -- this they can not do, since IRIX contains code that
was licensed from vendors who are not anxious to have their
property given away free).

I rather suspect that the GPL was intentionally chosen by SGI
to permit them to jump on the Linux/Open Source bandwagon,
without exposing them to the risk of a commercial organization
which competes with SGI being able to benefit from the technology
being released; QNX, Windows NT, and Solaris are all obvious
candidates for this anticompetitive practice).

Conclusion:

Creating a truly free journalled FS implementation, even if it
were to end up being bidirectionally data-compatible with XFS
disks, is a worthy project.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 13:56:50 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.integratus.com (unknown [63.209.2.83])
	by hub.freebsd.org (Postfix) with SMTP id B77FF37B6AD
	for <freebsd-fs@FreeBSD.ORG>; Wed,  7 Feb 2001 13:56:32 -0800 (PST)
Received: (qmail 3611 invoked from network); 7 Feb 2001 21:56:32 -0000
Received: from kungfu.integratus.com (HELO integratus.com) (172.20.5.168)
  by tortuga1.integratus.com with SMTP; 7 Feb 2001 21:56:32 -0000
Message-ID: <3A81C490.598F7EB7@integratus.com>
Date: Wed, 07 Feb 2001 13:56:32 -0800
From: Jack Rusher <jar@integratus.com>
Organization: http://www.integratus.com/
X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>
Cc: Sam Leffler <sam@errno.com>,
	Zhiui Zhang <zzhang@cs.binghamton.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <200102072148.OAA25001@usr08.primenet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Terry Lambert wrote:
> 
> Unfortunately, this license means that it can not be distributed
> compiled into a FreeBSD kernel, since clause 6 of the GPL will
> specifically prohibit such distribution.

  I have been wondering about this legal issue lately.  What is the law
with regards to implementing XFS as a KLM for FreeBSD & shipping the
source in contrib?  It won't help people who are trying to make
commercial products with embedded FreeBSD, but it might be useful for
sysadmins.

> point of failure for boot, and therefore the increased MTBF
> that supposedly comes from using an advanced FS does nothing
> for the overall MTBF.

  Mirror the boot partition with vinum?

> I rather suspect that the GPL was intentionally chosen by SGI
> to permit them to jump on the Linux/Open Source bandwagon,
> without exposing them to the risk of a commercial organization
> which competes with SGI being able to benefit from the technology

  This is unquestionably true.  I have word from some of the architects
who helped design XFS that this was exactly the reason GPL was chosen
over the BSD license.

-- 
Jack Rusher, Senior Engineer | mailto:jar@integratus.com
Integratus, Inc.             | http://www.integratus.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 14:10:17 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id CDBAC37B401
	for <freebsd-fs@FreeBSD.ORG>; Wed,  7 Feb 2001 14:09:57 -0800 (PST)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id PAA13495;
	Wed, 7 Feb 2001 15:06:59 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp03.primenet.com, id smtpdAAAq5aisA; Wed Feb  7 15:06:49 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id PAA25657;
	Wed, 7 Feb 2001 15:09:43 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072209.PAA25657@usr08.primenet.com>
Subject: Re: Design a journalled file system
To: zzhang@cs.binghamton.edu (Zhiui Zhang)
Date: Wed, 7 Feb 2001 22:09:43 +0000 (GMT)
Cc: freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.SOL.4.21.0102061544230.6584-100000@opal> from "Zhiui Zhang" at Feb 06, 2001 04:15:45 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> I am considering the design of a journalled file system in FreeBSD. I
> think each transaction corresponds to a file system update operation and
> will therefore consists of a list of modified buffers.  The important
> thing is that these buffers should not be written to disk until they have
> been logged into the log area. To do so, we need to pin these buffers in
> memory for a while. The concept should be simple, but I run into a problem
> which I have no idea how to solve it:
> 
> If you access a lot of files quickly, some vnodes will be reused.  These
> vnodes can contain buffers that are still pinned in the memory because of
> the write-ahead logging constraints.  After a vnode is gone, we have
> no way to recover its buffers. Note that whenever we need a new vnode, we
> are in the process of creating a new file. At this point, we can not flush
> the buffers to the log area.  The result is a deadlock.
> 
> I could make copies of the buffers that are still pinned, but that incurs
> memory copy and need buffer headers, which is also a rare resource.
> 
> The design is similar to ext3fs of linux (they do not seem to have a vnode
> layer and they use device + physical block number instead of vnode +
> logical block number to index buffers, which, I guess, means that buffers
> can exist after the inode is gone). I know Mckusick has a paper on
> journalling FFS, but I just want to know if this design can work or not.

Soft updates provides this guarantee.  It's one approach.

If you look at the Ganger/Patt paper, it's pretty obvious that
the soloution to the graph dependency problem could be generalized.

This would let you externalize hooks into the graph, so that you
yould have dependencies span stacking layers, or so that you could
externalize a transation interface to user space, or so that you
could implement a distributed cache coherency protocol, over a
network transport, on the bottom end.


In the limit, though, it means that you should think of an FS in
terms of a set of ordered metadata and data transactions, and then
simply ensure that transactions are handled in sufficient order
("sufficient" means that FFS can lose data, but never become
inconsistant; a journalled FS would not have this luxury).

For journalling, this is a slightly tougher problem, since you
must include the idea of data consistency, not just metadata
consistency, but the problem is not insoluable.

Starting from first principles, you should look at the transactions
you intend to support.  You should probably _not_ commit to a
storage paradigm (e.g. "... similar to ext3fs of Linux ... "),
until _after_ you have mapped out the operations, and what they
imply about conflict domains (e.g. several objects in one disk
block, or one page, which is what leads to much of the complexity
of the FFS soft updates implementation).

Probably the first thing you will notice is that the VOP_ABORT
semantics are horribly broken: I noticed the same thing, when
looking at implementing a writeable NTFS for Windows 95/98/2000,
using the Heidemann framework ported from FreeBSD.

I would say that you were also constrained by POSIX guaranteed
semantics, though it would be convenient to be able to turn most
of these off, to avoid vnode/data seeks, though this is an anecdotal
conclusion from some recent literature (don't trust it until you
can conclude what the effect will be under non-single-threaded FS
load).


NB: I was unable to convince either Ganger or McKusick of the idea
of generalization, where on mount you register conflict resolvers
into a dependency graph, which you maintain as stacking is done and
undone, and VOPs are added and removed.  Both cited different
reasons for objecting.  Kirk objected to what he saw as a larger
in-core dependency accounting storage requirement.  IMO, Kirk's
reasons were not really correct, since any given dependency could
be expressed and resolved using the same structures.  I was unable
to provide a proof of concept due to license issues, which I very
well understand Kirk wanting to enforce at the time.  Gregory had
different objections, which I laid off to familiarity with graph
theory (you _can_ maintain a running accounting of transitive
colsure over a graph, particularly one that doesn't change except
on mount or unmount), but I wouldn't dismiss either of them on
the basis of their gut feelings (I trust mine, but they trust
theirs, which is right for them to do).

That aside, even if you don't do a generalized implementation, the
approach of considering an FS in terms of transactions (events) is
still sound, and I think most modern FS researchers would agree with
the approach, even if they did not agree on implementation.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 15:23:24 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP
	id 71B2A37B6C3; Wed,  7 Feb 2001 15:23:04 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G8EUAA03.L5I; Wed, 7 Feb 2001 18:22:58 -0500 
Message-ID: <002e01c0915d$326a7ec0$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Terry Lambert" <tlambert@primenet.com>,
	"Boris Popov" <bp@butya.kz>
Cc: <freebsd-arch@FreeBSD.ORG>, <freebsd-fs@FreeBSD.ORG>
References: <200102072126.OAA24284@usr08.primenet.com>
Subject: Re: vnode interlock API
Date: Wed, 7 Feb 2001 18:25:02 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Terry Lambert wrote:

[...]
> 3) It seems to mee that the additional parameter of MTX_DEF is
> gratuitous, and tries to stretch mutex semantics further
> than they should be stretched.  I personally would have no
> problem with the conversion of simple_{un}lock() into the
> equivalent mtx_*() calls.  Even if the MTX_DEF can not be
> murdered without a large public outcry, using this as the

    Actually, it has been murdered: 

    http://people.freebsd.org/~bmilekic/code/mutex_cleanup-7.1.diff

    Presently under testing.

> the default demantic for the simple_*() equivalents isn't
> really a bad idea, in my book, and could be done with
> inline wrappers.  Best case, one could apply the WITNESS
> code to debugging 4.x problems, with some work.
 
[...]
> 
> Terry Lambert
> terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

Regards,
Bosko.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 15:24: 0 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP id 411A637B65D
	for <freebsd-fs@FreeBSD.ORG>; Wed,  7 Feb 2001 15:23:41 -0800 (PST)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id QAA17710;
	Wed, 7 Feb 2001 16:18:19 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp04.primenet.com, id smtpdAAAsva4DI; Wed Feb  7 16:18:07 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id QAA27692;
	Wed, 7 Feb 2001 16:23:23 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102072323.QAA27692@usr08.primenet.com>
Subject: Re: Design a journalled file system
To: jar@integratus.com (Jack Rusher)
Date: Wed, 7 Feb 2001 23:23:17 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert),
	sam@errno.com (Sam Leffler), zzhang@cs.binghamton.edu (Zhiui Zhang),
	freebsd-fs@FreeBSD.ORG
Reply-To: freebsd-chat@FreeBSD.ORG
In-Reply-To: <3A81C490.598F7EB7@integratus.com> from "Jack Rusher" at Feb 07, 2001 01:56:32 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > Unfortunately, this license means that it can not be distributed
> > compiled into a FreeBSD kernel, since clause 6 of the GPL will
> > specifically prohibit such distribution.
> 
>   I have been wondering about this legal issue lately.  What is the law
> with regards to implementing XFS as a KLM for FreeBSD & shipping the
> source in contrib?  It won't help people who are trying to make
> commercial products with embedded FreeBSD, but it might be useful for
> sysadmins.

You won't be able to boot from it, unless you compile your own
kernel.  This was pretty much the Soft Updates status, until
recently.

The problem with the GPL clause 6 is that it prohibits any
additional restrictions, and requiring the distribution of
another license, even if it does not otherwise conflict, is
a restriction on what can be done with the code.  Without
that other license, the right granted to you to use the code
in question doesn't exist, since it is the license which was
the origin of the grant.

Like Matt Dillon and Best Internet did with the Soft Updates
code, a local administrator could use it, but it could not
be distributed in a usable form.

Actually, this brings up a seperate sticky legal point,
which is how the assets of Best Internet were transferred
when it was sold, since I assume that the machines that had
Soft Updates on them kept Soft Updates on them.  I suppose
that the new owners could have rebuilt the kernels on all
the machines, getting identical kernels, after first booting
to a non-Soft Updates kernel for the transfer of legal
posession.


Distribution of a binary kernel module would really depend on
whether you could get away with treating a kernel as a library,
under the GPL allowing the linking of GPL'ed programs against
system libraries.  You have to wonder if a kernel module is a
program or just a program component, with the kernel being
the program.  BeOS side-steps this for non-boot drivers by
running the driver in a user space process, so it's provably
a program.


Anyway, that's the kind of hoop-jumping that you _could_ do
to get around the problem (maybe).  I have no idea what the
transfer of ownership caluses in the GPL would do if a company
were to IPO, for example, or what the concept of "publically
held" would mean on that context (since anyone who holds the
ownership of the software can demand the source, and the source
itself is not legal to distribute, under the conflicting
licenses).

Not really my problem, though, since I tend to try to avoid
just this sort of entanglement.  So did IBM, when I was
working for them.  8-).


[ ... boot MTBF ... ]

>   Mirror the boot partition with vinum?

I'm not sure this works yet.  Hardware RAID mirroing certainly
would, since it'd have to deal with the BIOS boot device issue.


> > I rather suspect that the GPL was intentionally chosen by SGI
> > to permit them to jump on the Linux/Open Source bandwagon,
> > without exposing them to the risk of a commercial organization
> > which competes with SGI being able to benefit from the technology
> 
>   This is unquestionably true.  I have word from some of the architects
> who helped design XFS that this was exactly the reason GPL was chosen
> over the BSD license.

I had a pretty long discussion with their V.P. of engineering,
who made the decision (they have a number of "V.P. of engineering"
lying around).  He didn't come out and say the same thing, and I
really didn't attribute it to that, since it means that any bug
fixes are GPL-code derived, and therefore also GPL.  That would
mean that they really don't expect any useful work to come out of
the Linux community, or that they expected people to just sign
over rights to anything interesting, which I think would be a bit
naieve, to say the least.

FYI: Followups set to -chat...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 15:41: 0 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18])
	by hub.freebsd.org (Postfix) with ESMTP id D787B37B503
	for <freebsd-fs@FreeBSD.ORG>; Wed,  7 Feb 2001 15:40:40 -0800 (PST)
Received: from onyx (onyx.cs.binghamton.edu [128.226.140.171])
	by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f17NeWI21997;
	Wed, 7 Feb 2001 18:40:32 -0500 (EST)
Date: Wed, 7 Feb 2001 18:40:21 -0500 (EST)
From: Zhiui Zhang <zzhang@cs.binghamton.edu>
X-Sender: zzhang@onyx
To: Terry Lambert <tlambert@primenet.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
In-Reply-To: <200102072209.PAA25657@usr08.primenet.com>
Message-ID: <Pine.SOL.4.21.0102071833210.3918-100000@onyx>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Thanks for your email! Even if I think I have a fairly good understanding
of the FFS code (not soft-update) by actually studying/modifying the code,
I still have a long way to go to understand the bigger picture which you
have described.

-Zhihui

On Wed, 7 Feb 2001, Terry Lambert wrote:

> > I am considering the design of a journalled file system in FreeBSD. I
> > think each transaction corresponds to a file system update operation and
> > will therefore consists of a list of modified buffers.  The important
> > thing is that these buffers should not be written to disk until they have
> > been logged into the log area. To do so, we need to pin these buffers in
> > memory for a while. The concept should be simple, but I run into a problem
> > which I have no idea how to solve it:
> > 
> > If you access a lot of files quickly, some vnodes will be reused.  These
> > vnodes can contain buffers that are still pinned in the memory because of
> > the write-ahead logging constraints.  After a vnode is gone, we have
> > no way to recover its buffers. Note that whenever we need a new vnode, we
> > are in the process of creating a new file. At this point, we can not flush
> > the buffers to the log area.  The result is a deadlock.
> > 
> > I could make copies of the buffers that are still pinned, but that incurs
> > memory copy and need buffer headers, which is also a rare resource.
> > 
> > The design is similar to ext3fs of linux (they do not seem to have a vnode
> > layer and they use device + physical block number instead of vnode +
> > logical block number to index buffers, which, I guess, means that buffers
> > can exist after the inode is gone). I know Mckusick has a paper on
> > journalling FFS, but I just want to know if this design can work or not.
> 
> Soft updates provides this guarantee.  It's one approach.
> 
> If you look at the Ganger/Patt paper, it's pretty obvious that
> the soloution to the graph dependency problem could be generalized.
> 
> This would let you externalize hooks into the graph, so that you
> yould have dependencies span stacking layers, or so that you could
> externalize a transation interface to user space, or so that you
> could implement a distributed cache coherency protocol, over a
> network transport, on the bottom end.
> 
> 
> In the limit, though, it means that you should think of an FS in
> terms of a set of ordered metadata and data transactions, and then
> simply ensure that transactions are handled in sufficient order
> ("sufficient" means that FFS can lose data, but never become
> inconsistant; a journalled FS would not have this luxury).
> 
> For journalling, this is a slightly tougher problem, since you
> must include the idea of data consistency, not just metadata
> consistency, but the problem is not insoluable.
> 
> Starting from first principles, you should look at the transactions
> you intend to support.  You should probably _not_ commit to a
> storage paradigm (e.g. "... similar to ext3fs of Linux ... "),
> until _after_ you have mapped out the operations, and what they
> imply about conflict domains (e.g. several objects in one disk
> block, or one page, which is what leads to much of the complexity
> of the FFS soft updates implementation).
> 
> Probably the first thing you will notice is that the VOP_ABORT
> semantics are horribly broken: I noticed the same thing, when
> looking at implementing a writeable NTFS for Windows 95/98/2000,
> using the Heidemann framework ported from FreeBSD.
> 
> I would say that you were also constrained by POSIX guaranteed
> semantics, though it would be convenient to be able to turn most
> of these off, to avoid vnode/data seeks, though this is an anecdotal
> conclusion from some recent literature (don't trust it until you
> can conclude what the effect will be under non-single-threaded FS
> load).
> 
> 
> NB: I was unable to convince either Ganger or McKusick of the idea
> of generalization, where on mount you register conflict resolvers
> into a dependency graph, which you maintain as stacking is done and
> undone, and VOPs are added and removed.  Both cited different
> reasons for objecting.  Kirk objected to what he saw as a larger
> in-core dependency accounting storage requirement.  IMO, Kirk's
> reasons were not really correct, since any given dependency could
> be expressed and resolved using the same structures.  I was unable
> to provide a proof of concept due to license issues, which I very
> well understand Kirk wanting to enforce at the time.  Gregory had
> different objections, which I laid off to familiarity with graph
> theory (you _can_ maintain a running accounting of transitive
> colsure over a graph, particularly one that doesn't change except
> on mount or unmount), but I wouldn't dismiss either of them on
> the basis of their gut feelings (I trust mine, but they trust
> theirs, which is right for them to do).
> 
> That aside, even if you don't do a generalized implementation, the
> approach of considering an FS in terms of transactions (events) is
> still sound, and I think most modern FS researchers would agree with
> the approach, even if they did not agree on implementation.
> 
> 
> 					Terry Lambert
> 					terry@lambert.org



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Feb  7 23: 3:49 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp1b.mail.yahoo.com (smtp3.mail.yahoo.com [128.11.68.135])
	by hub.freebsd.org (Postfix) with SMTP id 0BA5B37B491
	for <fs@freebsd.org>; Wed,  7 Feb 2001 23:03:28 -0800 (PST)
Received: from nat-198-95-226-208.netapp.com (HELO fdevijvelap) (198.95.226.208)
  by smtp.mail.vip.suc.yahoo.com with SMTP; 8 Feb 2001 08:10:21 -0000
X-Apparently-From: <fdevijve@yahoo.com>
Message-ID: <05cd01c0919c$c77b0db0$1fc9a8c0@europe.netapp.com>
From: "fab" <fdevijve@yahoo.com>
To: "Matt Dillon" <dillon@earth.backplane.com>,
	"Mike Smith" <msmith@freebsd.org>
Cc: "Michael C . Wu" <keichii@iteration.net>,
	"Mitch Collinsworth" <mitch@ccmr.cornell.edu>, <hackers@FreeBSD.ORG>,
	<fs@FreeBSD.ORG>
References: <200102052052.f15KqOe00985@mass.dis.org>
Subject: Re: Extremely large (70TB) File system/server planning 
Date: Thu, 8 Feb 2001 07:36:30 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Hi Mens,

it's exact that filers can't exceed 6TB but we can have eaysyly performance
(pretty so good) with their.

If you try to have EMC box or IBM, you will have to manage anything that
it's not your job (IO for example).

I think that netapp can be a very simple solution (where other man sells
complexity)

Thanks

    Fab.


----- Original Message -----
From: Mike Smith <msmith@freebsd.org>
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Michael C . Wu <keichii@iteration.net>; Mitch Collinsworth
<mitch@ccmr.cornell.edu>; <hackers@FreeBSD.ORG>; <fs@FreeBSD.ORG>
Sent: Monday, February 05, 2001 9:52 PM
Subject: Re: Extremely large (70TB) File system/server planning


> >
> > :| > The files are accessed approximately 3 or 4 times a day on average.
> > :| > Older files are archived for reference purpose and may never
> > :| > be accessed after a week.
> > :|
> > :| Ok, this is a start.  Now is the 70 TB the size of the active files?
> > :| Or does that also include the older archived files that may never be
> > :| accessed again?
> > :70TB is the size of the sum of all files, access or no access.
> > :(They still want to maintain accessibility even though the chances are
slim.)
> ...
> >     This doesn't sound like something you can just throw together with
> >     off-the-shelf PCs and still have something reliable to show for it.
> >     You need a big honking RAID system - maybe a NetApp, maybe something
> >     else.  You have to look at the filesystem and file size limitations
> >     of the unit and the client(s).
>
> You can't do this with a NetApp either; they max out at about 6TB now
> (going up to around 12 or so soon).  You might want to talk to EMC and/or
> IBM, both of whom have *extremely* large filers.
>
> Your friend may also want to look at Traakan, who have a novel product in
> this space.
>
> --
> ... every activity meets with opposition, everyone who acts has his
> rivals and unfortunately opponents also.  But not because people want
> to be opponents, rather because the tasks and relationships force
> people to take different points of view.  [Dr. Fritz Todt]
>            V I C T O R Y   N O T   V E N G E A N C E
>
>
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Feb  8 23: 6: 8 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from lips.borg.umn.edu (lips.borg.umn.edu [160.94.232.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4692B37B491; Thu,  8 Feb 2001 23:05:40 -0800 (PST)
Received: from thebarn.com (nic-31-c12-219.mn.mediaone.net [24.31.12.219])
	by lips.borg.umn.edu (8.11.2/8.10.1) with ESMTP id f1975Zb30917;
	Fri, 9 Feb 2001 01:05:36 -0600 (CST)
Message-ID: <3A8396B9.CA8C09E4@thebarn.com>
Date: Fri, 09 Feb 2001 01:05:29 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.74 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-chat@FreeBSD.ORG
Cc: Jack Rusher <jar@integratus.com>,
	Terry Lambert <tlambert@primenet.com>, Sam Leffler <sam@errno.com>,
	Zhiui Zhang <zzhang@cs.binghamton.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <200102072323.QAA27692@usr08.primenet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Terry Lambert wrote:

>

{...}

>
> > > I rather suspect that the GPL was intentionally chosen by SGI
> > > to permit them to jump on the Linux/Open Source bandwagon,
> > > without exposing them to the risk of a commercial organization
> > > which competes with SGI being able to benefit from the technology
> >
> >   This is unquestionably true.  I have word from some of the architects
> > who helped design XFS that this was exactly the reason GPL was chosen
> > over the BSD license.
>
> I had a pretty long discussion with their V.P. of engineering,
> who made the decision (they have a number of "V.P. of engineering"
> lying around).  He didn't come out and say the same thing, and I
> really didn't attribute it to that, since it means that any bug
> fixes are GPL-code derived, and therefore also GPL.  That would
> mean that they really don't expect any useful work to come out of
> the Linux community, or that they expected people to just sign
> over rights to anything interesting, which I think would be a bit

> naieve, to say the least.

I'm not sure who you talked with?  but it really it that simple.

The reason the GPL was chosen for XFS.

It's the license Linux is using, and since

the port is being done for Linux it makes sense.

SGI is also doing work with the XFree code, the work

is being released under the X license (which is also

an anti GPL license).

SGI is basically matching license for licensee to

whatever project they are contributing to.

This from the lawyer that is doing all the open source work.

I have stated this in the past but I will bring it up again.

If sufficient momentum can be generated toward an fbsd port

of XFS, it may be possible to go to the lawyers and have a another

license drawn up.

But unless the bsd community can show they are serious about XFS

being ported it would be a waste of time to ask for something that SGI

has very little business interesting in doing.

Note Darwin might be a big win in terms of making a business case

for another platform.

The license shouldn't be that big of an issue.

Lots of fbsd uses GPL'ed code... hmm gcc for example.

Let get to the point were XFS is in such demand on fbsd

we can get a petition going if necessary to have the license

updated.

BTW if anybody is interested a few of us have started looking

at actually doing the port.

Not much has been done at this point... basically battling through

header file cleanup.

Ohh one other comment:

The only time SGI may ask for a copy write reassignment is if the

contributed code affects the filesystem compatibility between irix and

linux. This would have to be a major contribution before something like

this would be an issue, and some negotiation will most certainly be involved.

Up to to this point all bug fixes have been linux related only so it really

isn't an issue.

This isn't SGI trying to be an ass... rather SGI trying to provide the

most compatible FS it can within the constrains of many legal issues.

--
Russell Cattelan
cattelan@thebarn.com





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Feb  8 23:13:11 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from lips.borg.umn.edu (lips.borg.umn.edu [160.94.232.50])
	by hub.freebsd.org (Postfix) with ESMTP id 1C66637B401
	for <freebsd-fs@FreeBSD.ORG>; Thu,  8 Feb 2001 23:12:53 -0800 (PST)
Received: from thebarn.com (nic-31-c12-219.mn.mediaone.net [24.31.12.219])
	by lips.borg.umn.edu (8.11.2/8.10.1) with ESMTP id f197Cpb30997;
	Fri, 9 Feb 2001 01:12:51 -0600 (CST)
Message-ID: <3A83986E.55789E59@thebarn.com>
Date: Fri, 09 Feb 2001 01:12:46 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.74 [en] (X11; U; Linux 2.2.12 i386)
X-Accept-Language: en
MIME-Version: 1.0
To: Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <Pine.SOL.4.21.0102061544230.6584-100000@opal>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Zhiui Zhang wrote:

> I am considering the design of a journalled file system in FreeBSD. I
> think each transaction corresponds to a file system update operation and
> will therefore consists of a list of modified buffers.  The important
> thing is that these buffers should not be written to disk until they have
> been logged into the log area. To do so, we need to pin these buffers in
> memory for a while. The concept should be simple, but I run into a problem
> which I have no idea how to solve it:
>
> If you access a lot of files quickly, some vnodes will be reused.  These
> vnodes can contain buffers that are still pinned in the memory because of
> the write-ahead logging constraints.  After a vnode is gone, we have
> no way to recover its buffers. Note that whenever we need a new vnode, we
> are in the process of creating a new file. At this point, we can not flush
> the buffers to the log area.  The result is a deadlock.

XFS:
All pinned buffers are keep on a queue to be flushed by a
daemon that walks the queue looking for buffer that
have recently become unlocked and unpinned.


>
>
> I could make copies of the buffers that are still pinned, but that incurs
> memory copy and need buffer headers, which is also a rare resource.
>
> The design is similar to ext3fs of linux (they do not seem to have a vnode
> layer and they use device + physical block number instead of vnode +
> logical block number to index buffers, which, I guess, means that buffers
> can exist after the inode is gone). I know Mckusick has a paper on

Yup.  All meta data buffer use  and absolute device offset.


> journalling FFS, but I just want to know if this design can work or not.
>
> Any ideas?  Thanks for your help!
>
> -Zhihui
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

--
Russell Cattelan
cattelan@thebarn.com





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9  0:57:26 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6D41F37B503; Fri,  9 Feb 2001 00:56:51 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id BAA18553;
	Fri, 9 Feb 2001 01:51:58 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAA5RaOkK; Fri Feb  9 01:51:48 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id BAA08304;
	Fri, 9 Feb 2001 01:56:29 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102090856.BAA08304@usr08.primenet.com>
Subject: Re: Design a journalled file system
To: cattelan@thebarn.com (Russell Cattelan)
Date: Fri, 9 Feb 2001 08:56:29 +0000 (GMT)
Cc: freebsd-chat@FreeBSD.ORG, jar@integratus.com (Jack Rusher),
	tlambert@primenet.com (Terry Lambert), sam@errno.com (Sam Leffler),
	zzhang@cs.binghamton.edu (Zhiui Zhang), freebsd-fs@FreeBSD.ORG
In-Reply-To: <3A8396B9.CA8C09E4@thebarn.com> from "Russell Cattelan" at Feb 09, 2001 01:05:29 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

OK, this is not a license war.  I will lay it on the line.

I am offering to do a preliminary port of the XFS code,
potentially to the point of minimally a read-only mount, and
perhaps much further, depending on the effort required.

The resulting code will have some nasty strings, based on
me assuming your comments are correct, and wanting some
guarantees on that, on my part.  The strings go away when
your claims to SGI's actions are met.

Below is my reply to your message, including the philosophical
basis for the strings, a description of the strings, and the
details of my offer.

This offer is good for a starting date before 01 March 2001.

--

> I'm not sure who you talked with?  but it really it that simple.

Vijay.  The V.P. of Engineering at SGI who negotiated the
release of the code.  I will quote one of his statements,
made to me in email:

] Can't you just relicense FreeBSD [under the GPL]?


> The reason the GPL was chosen for XFS.
> It's the license Linux is using, and since
> the port is being done for Linux it makes sense.

One would think a dual license.  Alternately, one would think
they would use the LGPL, which would let people link it into
their kernels, as long as they gave source (which BSD does)
or otherwise permitted relinking.

In other words, the GPL is not really an optimal license, if
they wanted wide use AND specific Linux license compatability.
I concluded from their choice that they were not going for
wide use, but instead wanted the marketing benefit of being
associated with Linux (lots of press, etc.).


> SGI is also doing work with the XFree code, the work
> is being released under the X license (which is also
> an anti GPL license).

The BSD and MIT licenses predate the GPL, so careful with the
word "anti" there...


> SGI is basically matching license for licensee to
> whatever project they are contributing to.
> This from the lawyer that is doing all the open source work.

Rather than the use to which the software is put; that's a bit
naieve, then, again.


> I have stated this in the past but I will bring it up again.
> If sufficient momentum can be generated toward an fbsd port
> of XFS, it may be possible to go to the lawyers and have a another
> license drawn up.

If we had it in writing that the code would be released under
a license usable by the BSD kernel, preferrably "matching license
for license", as you state, then we would commit to do the work.

The problem we have is that the code under the current license
is useless to us, and unless we can be ensured that the code we
write to glue it in won't end up also being useless to us, there
is really no reason to commit the effort.


> But unless the bsd community can show they are serious about
> XFS being ported it would be a waste of time to ask for
> something that SGI has very little business interesting in doing.

So if we were to do a port, then SGI would have a business interest,
and would relicense the code?  Can we have that in writing?


> Note Darwin might be a big win in terms of making a business case
> for another platform.

Darwin support would be automatic, with a FreeBSD port.  Darwin
can use FreeBSD FS code, unmodified.


> The license shouldn't be that big of an issue.

It shouldn't, but it is.  I would have been ecstatic to use XFS
in the Whistle InterJet, as a means of getting rid of the need
for a UPS; as a technology for doing exactly that, it's superior
to Soft Updates (Soft Updates has other valuable attributes, but
that was the one we were interested in obtaining).  The is not a
chance in hell of IBM shipping a product based on code without a
license grant in perpetuity already locked in a vault.


> Lots of fbsd uses GPL'ed code... hmm gcc for example.

FreeBSD _utilizes_ this code, it does not _use_ it.  The gcc
code can be diked out of a FreeBSD system, without crippling
the utility of the system.  In an embeded product, that code
_is_ diked out.  There is no gcc code linked into the FreeBSD
kernel.


> Let get to the point were XFS is in such demand on fbsd
> we can get a petition going if necessary to have the license
> updated.

Demand is very different; it is an aspect of marketing.  How
much demand do you want, and where do you want it directed?  I
believe that it would be a trivial exercise to generate as much
demand as you require.


> BTW if anybody is interested a few of us have started looking
> at actually doing the port.  Not much has been done at this
> point... basically battling through header file cleanup.

If you have your head wrapped around it already, file system
code is really very trivial, particularly if you have code that
already works in one environment, and are merely porting it.

I'll tell you what: give me a pointer to the code without the
Linux modifications, so that I won't inadvertantly include code
that is derived from GPL'ed code, and I will create a FreeBSD
port of the code, with all code additions, which will compile
and link successfully in a FreeBSD kernel, in a matter of a few
days.  I will additionally require an image of an XFS FS on a
floppy disk, which I can use for compatability testing.  There
should be one file with an example of each thing the FS is
capable of representing, including a directory, a directory
with a subdirectory, a file, and a directory with two files;
the files should be short, but if immediate files exist, one
should be long enough to trigger indirection.  It would be most
useful if the image were zero'ed before it was created, so I am
able to distinguish XFS written data from "blank floppy" contents
(and to aid compression of the image).

I will provide my code for FTP, which will be licensed to
explicitly prohibit all but developement use, with a license
which will transform itself to the three clause Berkeley
license, if the XFS code which it's designed to work with
is also released under a Berkeley-style license, and a release
from patent claims in the covered code.

In other words, the code I provide will be useless to everyone
but FS researchers, unless the SGI license on the XFS components
it must be linked with change to permit BSD to use the code as a
boot FS, and further, permit commercial use by not hiding submerged
patent infringement lawsuits which will be sprung on the unwary,
as soon as someone with deep pockets uses the code.

Call me distrustful, but I am fully capable of delivering in a
very short time frame, so I'm pretty much the only game.


> Ohh one other comment:
> The only time SGI may ask for a copy write reassignment is if
> the contributed code affects the filesystem compatibility
> between irix and linux. This would have to be a major
> contribution before something like this would be an issue, and
> some negotiation will most certainly be involved.

You're damn straight there will be: SGI will be begging the
author to assign rights to a derivative work of SGI's own
code.  If that author is philosophically adamant about the
GPL, the assignment of rights will never happen, unless the
author also lacks personal integrity, and SGI is willing to
buy them out of their philosophical stubborness, or pay
their own engineers to recreate the code.


> Up to to this point all bug fixes have been linux related only
> so it really isn't an issue.

I maintain it probably never will be.  Ask Vijay for my
arguments in this regard; they boil down to the level of
effort and complexity involved in FS hacking.  It takes a
professional, someone with academic rigor, to do useful work.

Consider that the only minds capable of adding Soft Updates
technology to XFS, without a huge capital expenditure, are
existant _only_ in the BSD community.


> This isn't SGI trying to be an ass... rather SGI trying to
> provide the most compatible FS it can within the constrains
> of many legal issues.

A library style license of the Mozilla bent would have been
able to accomplish this rather easily, without losing SGI
rights to (putative) improvements, and without limiting the
compatability of the license to nothing but Linux.  Linux
could archive it and treat it as a statically linked library
used by the kernel or a kernel module.

The effect on BSD would have been to require it to do what
it does already, and for systems vendors to provide an "ld -r"'ed
kernel and XFS source code.  A pain in the ass, but livable
for most commercial users and embedded systems vendors.

I can't believe SGI's lawyers didn't know precisely what they
were giving away, and what they weren't.

--

So, are you going to point me at the pure (convertable to
another license, since it contains only SGI contributions)
SGI XFS code, and an image of a sample FS that I can write
to a floppy for testing purposes?

Meanwhile, I think the FreeBSD community should continue to
pursue their own JFS, under a useful license that could then
trigger commercial support for the programming required...

That's how the BSD community gets professional programmers
to do complex and unpleasent tasks, while other communities
never get the unpleasent tasks (e.g. Soft Updates [Whistle/IBM],
fully unified VM and buffer cache [Oracle], etc.) done at all,
after all.  Marketing is a poor coin for getting long term
work done; it's too ephemeral for a long term investment to
be worthwhile.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9  8:31:39 2001
Delivered-To: freebsd-fs@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 753)
	id F16E237B97D; Fri,  9 Feb 2001 08:08:01 -0800 (PST)
Date: Fri, 9 Feb 2001 08:08:01 -0800
From: Adrian Chadd <adrian@FreeBSD.org>
To: tlambert@primenet.com
Cc: Russell Cattelan <cattelan@thebarn.com>,
	freebsd-chat@FreeBSD.ORG, Jack Rusher <jar@integratus.com>,
	Terry Lambert <tlambert@primenet.com>, Sam Leffler <sam@errno.com>,
	Zhiui Zhang <zzhang@cs.binghamton.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: XFS
Message-ID: <20010209080801.A56926@hub.freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Terry said:

> I am offering to do a preliminary port of the XFS code,
> potentially to the point of minimally a read-only mount, and
> perhaps much further, depending on the effort required.

.. and I'm already (only initially) trudging my way through the
linux XFS code and slowly fixing it up.

I've hit a sticker - the lacking mount interface we have - which
I'm also slowly reworking to be more flexible and suited to
the XFS requirements.

So Terry, if you'd like to help, lets sort out the mount interface,
help me finish bits of the userland interface, and then we can
work on getting the XFS kernel code in.

.. i might say that from what I hear, it might be easier to port
XFS to FreeBSD based on the original XFS code before it was
Linux-ified, but I'm willing to walk through the linux code.



Adrian



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9  9:12:26 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94])
	by hub.freebsd.org (Postfix) with ESMTP
	id 133B837B9B9; Fri,  9 Feb 2001 08:22:01 -0800 (PST)
Received: by relay.butya.kz (Postfix, from userid 1000)
	id E9B652863E; Fri,  9 Feb 2001 22:21:56 +0600 (ALMT)
Received: from localhost (localhost [127.0.0.1])
	by relay.butya.kz (Postfix) with ESMTP
	id CA656285D3; Fri,  9 Feb 2001 22:21:56 +0600 (ALMT)
Date: Fri, 9 Feb 2001 22:21:56 +0600 (ALMT)
From: Boris Popov <bp@butya.kz>
To: freebsd-fs@freebsd.org
Cc: freebsd-hackers@freebsd.org
Subject: Re: smbfs-1.3.3 released
In-Reply-To: <Pine.BSF.4.21.0101281356020.30001-100000@lion.butya.kz>
Message-ID: <Pine.BSF.4.21.0102092217160.31739-100000@lion.butya.kz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sun, 28 Jan 2001, Boris Popov wrote:

> 	Well, next version of smbfs for FreeBSD released today. It
> includes minor bug fixes and significantly reworked connection engine.

	As usually, major rewrites tends to introduce some bugs. So, I've
released 1.3.5 as update:

09.02.2001      1.3.5
    - The user and server names was swapped in the "TreeConnect"
      request (fixed by Jonathan Hanna).
    - smb requester could cause a panic if there is no free mbufs - fixed.
    - It is possible to use smbfs with devfs now, but it wasn't tested under
      SMP. Also note that device permissions will be wrong, because devfs
      do not allow passing of credentials to the cloning function.
    - nsmbX device moved from the /dev/net directory to /dev directory.

31.01.2001      1.3.4
    - Maintance: synch with changes in the recent -current

 	An updated version can be downloaded from
ftp://ftp.butya.kz/pub/smbfs/smbfs.tar.gz

--
Boris Popov
http://www.butya.kz/~bp/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9  9:25:28 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18])
	by hub.freebsd.org (Postfix) with ESMTP id 1C00437E0A4
	for <freebsd-fs@FreeBSD.ORG>; Fri,  9 Feb 2001 09:25:10 -0800 (PST)
Received: from onyx (onyx.cs.binghamton.edu [128.226.140.171])
	by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f19HP8c17870;
	Fri, 9 Feb 2001 12:25:08 -0500 (EST)
Date: Fri, 9 Feb 2001 12:24:54 -0500 (EST)
From: Zhiui Zhang <zzhang@cs.binghamton.edu>
X-Sender: zzhang@onyx
To: Russell Cattelan <cattelan@thebarn.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
In-Reply-To: <3A83986E.55789E59@thebarn.com>
Message-ID: <Pine.SOL.4.21.0102091214440.4738-100000@onyx>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I guess that this will involve either memory copying or changing the
buffer header directly. Linux seems to address buffer directly via
physical (not logical) block number, so there is no need to change the
buffer header. Plus, Linux have a reference count to prevent a buffer from
disappearing (brelse()'ed).

Another difficulty is that if several transactions are in progress at the
same time, we must remember which metadata buffers are modified by which
transactions. When we copy/rename the buffer, we must inform those
transactions the fact that we did the copy/rename.  The buffers modified
by one transaction must be flushed at the same time.

BTW, Linux GFS code seems to allow ONE transaction in progess at any time.

-Zhihui

On Fri, 9 Feb 2001, Russell Cattelan wrote:

> Zhiui Zhang wrote:
> 
> > I am considering the design of a journalled file system in FreeBSD. I
> > think each transaction corresponds to a file system update operation and
> > will therefore consists of a list of modified buffers.  The important
> > thing is that these buffers should not be written to disk until they have
> > been logged into the log area. To do so, we need to pin these buffers in
> > memory for a while. The concept should be simple, but I run into a problem
> > which I have no idea how to solve it:
> >
> > If you access a lot of files quickly, some vnodes will be reused.  These
> > vnodes can contain buffers that are still pinned in the memory because of
> > the write-ahead logging constraints.  After a vnode is gone, we have
> > no way to recover its buffers. Note that whenever we need a new vnode, we
> > are in the process of creating a new file. At this point, we can not flush
> > the buffers to the log area.  The result is a deadlock.
> 
> XFS:
> All pinned buffers are keep on a queue to be flushed by a
> daemon that walks the queue looking for buffer that
> have recently become unlocked and unpinned.
> 
> 
> >
> >
> > I could make copies of the buffers that are still pinned, but that incurs
> > memory copy and need buffer headers, which is also a rare resource.
> >
> > The design is similar to ext3fs of linux (they do not seem to have a vnode
> > layer and they use device + physical block number instead of vnode +
> > logical block number to index buffers, which, I guess, means that buffers
> > can exist after the inode is gone). I know Mckusick has a paper on
> 
> Yup.  All meta data buffer use  and absolute device offset.
> 
> 
> > journalling FFS, but I just want to know if this design can work or not.
> >
> > Any ideas?  Thanks for your help!
> >
> > -Zhihui
> >
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-fs" in the body of the message
> 
> --
> Russell Cattelan
> cattelan@thebarn.com
> 
> 
> 
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 10:43:22 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from sgi.com (sgi.SGI.COM [192.48.153.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id B566537B401; Fri,  9 Feb 2001 10:43:01 -0800 (PST)
Received: from ledzep.americas.sgi.com (relay.cray.com [137.38.226.97]) 
	by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id KAA05895; Fri, 9 Feb 2001 10:42:51 -0800 (PST)
	mail_from (cattelan@thebarn.com)
Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id MAA25243; Fri, 9 Feb 2001 12:42:50 -0600 (CST)
Received: from thebarn.com (localhost [127.0.0.1])
	by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f19Ifo020453;
	Fri, 9 Feb 2001 12:41:50 -0600
Message-ID: <3A8439ED.57011A40@thebarn.com>
Date: Fri, 09 Feb 2001 12:41:50 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Adrian Chadd <adrian@FreeBSD.ORG>
Cc: tlambert@primenet.com, freebsd-fs@FreeBSD.ORG
Subject: Re: XFS
References: <20010209080801.A56926@hub.freebsd.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Adrian Chadd wrote:

> Terry said:
>
> > I am offering to do a preliminary port of the XFS code,
> > potentially to the point of minimally a read-only mount, and
> > perhaps much further, depending on the effort required.
>
> .. and I'm already (only initially) trudging my way through the
> linux XFS code and slowly fixing it up.
>
> I've hit a sticker - the lacking mount interface we have - which
> I'm also slowly reworking to be more flexible and suited to
> the XFS requirements.
>
> So Terry, if you'd like to help, lets sort out the mount interface,
> help me finish bits of the userland interface, and then we can
> work on getting the XFS kernel code in.
>
> .. i might say that from what I hear, it might be easier to port
> XFS to FreeBSD based on the original XFS code before it was
> Linux-ified, but I'm willing to walk through the linux code.

I can go back in time and dig up any of the old interface code.
It will have to used only as reference since it may have old license
issues, most of it was clean but a couple of places had problems.
VFS and VNODE stuff was clean based on the fact the BSD code
is out there.

Note: we put a layer over the top of the XFS vfs/vnode interface
most of the interface is in tact, and should be a matter of stripping
of the linvfs_ layer.
CXFS needs stackable FS's, the linux VFS layer doesn't have
any concept of this, so we needed to keep the vfs/vnode stuff.
Behaviors will have to be added... this shouldn't be to much of a
problem.

Rig now the vnode is part of the linux inode structure... all the vnode
members were left in place... the only thing that was pushed up was
the count, but this was done with a macro, this should be a trivial
conversion.

>

I wish I had more time to work on this stuff, but the linux port has
a lot of work on the todo list.

But please keep asking questions; I really would like to see XFS on a
decent
OS.

>
> Adrian

--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 11:19:12 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.webmonster.de (datasink.webmonster.de [194.162.162.209])
	by hub.freebsd.org (Postfix) with SMTP id AB45D37B6A2
	for <fs@FreeBSD.ORG>; Fri,  9 Feb 2001 11:18:51 -0800 (PST)
Received: (qmail 84247 invoked by uid 1000); 9 Feb 2001 19:18:49 -0000
Date: Fri, 9 Feb 2001 20:18:49 +0100
From: "Karsten W. Rohrbach" <karsten@rohrbach.de>
To: Mike Smith <msmith@freebsd.org>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	"Michael C . Wu" <keichii@iteration.net>,
	Mitch Collinsworth <mitch@ccmr.cornell.edu>, hackers@FreeBSD.ORG,
	fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning
Message-ID: <20010209201849.B48420@rohrbach.de>
Reply-To: karsten@rohrbach.de
References: <200102051750.f15HoZ021657@earth.backplane.com> <200102052052.f15KqOe00985@mass.dis.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <200102052052.f15KqOe00985@mass.dis.org>; from msmith@freebsd.org on Mon, Feb 05, 2001 at 12:52:24PM -0800
X-Arbitrary-Number-Of-The-Day: 42
X-Sender: karsten@rohrbach.de
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Mike Smith(msmith@freebsd.org)@Mon, Feb 05, 2001 at 12:52:24PM -0800:
> 
> You can't do this with a NetApp either; they max out at about 6TB now 
> (going up to around 12 or so soon).  You might want to talk to EMC and/or 
> IBM, both of whom have *extremely* large filers.
from my experiences with filers (we have both, country and western here
- eg. netapp f740/760 and emc^2 symmetrix/connectrix) i can only say
that emc is a pile of sh** - no pun intended. actually the boxes work
okay, but you need a shitload of datamover boxes from emc to achieve
performance similar to netapp's 760 series (up to 12 data movers with
2gig of ram each). emc goes brute force, netapp use their brains.

when it comes to ibm, as far as i understand you have to hook up their
filers to rs/6000(aix) or s/370 or s/390 systems since they are "only"
fibrechannel or ficon attached raid subsystems, so the client platform
is responsible for handling all the filesystem stuff.

you might also check out lsi logic's filer products, i think they
support 12tb via nas.

> 
> Your friend may also want to look at Traakan, who have a novel product in 
> this space.
i checked out their website which says "under construction"
strange...


/k

> 
> -- 
> ... every activity meets with opposition, everyone who acts has his
> rivals and unfortunately opponents also.  But not because people want
> to be opponents, rather because the tasks and relationships force
> people to take different points of view.  [Dr. Fritz Todt]
>            V I C T O R Y   N O T   V E N G E A N C E
> 
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

-- 
> Hackers know all the right MOVs.
KR433/KR11-RIPE -- http://www.webmonster.de -- ftp://ftp.webmonster.de



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 11:23:56 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.webmonster.de (datasink.webmonster.de [194.162.162.209])
	by hub.freebsd.org (Postfix) with SMTP id A233137B6A6
	for <freebsd-fs@freebsd.org>; Fri,  9 Feb 2001 11:23:37 -0800 (PST)
Received: (qmail 84389 invoked by uid 1000); 9 Feb 2001 19:23:36 -0000
Date: Fri, 9 Feb 2001 20:23:36 +0100
From: "Karsten W. Rohrbach" <karsten@rohrbach.de>
To: Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc: freebsd-fs@freebsd.org
Subject: Re: Design a journalled file system
Message-ID: <20010209202336.D48420@rohrbach.de>
Reply-To: karsten@rohrbach.de
References: <Pine.SOL.4.21.0102061544230.6584-100000@opal>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <Pine.SOL.4.21.0102061544230.6584-100000@opal>; from zzhang@cs.binghamton.edu on Tue, Feb 06, 2001 at 04:15:45PM -0500
X-Arbitrary-Number-Of-The-Day: 42
X-Sender: karsten@rohrbach.de
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

there was a post last year to either -fs or -hackers that described the
use of a single disk or block device for logging all writes and
committing them afterwards. i think it was presented at usenic last year
or 99. can't find it now - please check the mailing list archives.
/k

Zhiui Zhang(zzhang@cs.binghamton.edu)@Tue, Feb 06, 2001 at 04:15:45PM -0500:
> 
> I am considering the design of a journalled file system in FreeBSD. I
> think each transaction corresponds to a file system update operation and
> will therefore consists of a list of modified buffers.  The important
> thing is that these buffers should not be written to disk until they have
> been logged into the log area. To do so, we need to pin these buffers in
> memory for a while. The concept should be simple, but I run into a problem
> which I have no idea how to solve it:
> 
> If you access a lot of files quickly, some vnodes will be reused.  These
> vnodes can contain buffers that are still pinned in the memory because of
> the write-ahead logging constraints.  After a vnode is gone, we have
> no way to recover its buffers. Note that whenever we need a new vnode, we
> are in the process of creating a new file. At this point, we can not flush
> the buffers to the log area.  The result is a deadlock.
> 
> I could make copies of the buffers that are still pinned, but that incurs
> memory copy and need buffer headers, which is also a rare resource.
> 
> The design is similar to ext3fs of linux (they do not seem to have a vnode
> layer and they use device + physical block number instead of vnode +
> logical block number to index buffers, which, I guess, means that buffers
> can exist after the inode is gone). I know Mckusick has a paper on
> journalling FFS, but I just want to know if this design can work or not.
> 
> Any ideas?  Thanks for your help!
> 
> -Zhihui
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

-- 
> Booze is the answer. I don't remember the question.
KR433/KR11-RIPE -- http://www.webmonster.de -- ftp://ftp.webmonster.de



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 11:39: 8 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10])
	by hub.freebsd.org (Postfix) with ESMTP
	id EC6F937B6EE; Fri,  9 Feb 2001 11:38:44 -0800 (PST)
Received: from ledzep.americas.sgi.com (ledzep.americas.sgi.com [137.38.226.97]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id LAA10650; Fri, 9 Feb 2001 11:37:32 -0800 (PST)
	mail_from (cattelan@thebarn.com)
Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id NAA26851; Fri, 9 Feb 2001 13:38:32 -0600 (CST)
Received: from thebarn.com (localhost [127.0.0.1])
	by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f19JbV020646;
	Fri, 9 Feb 2001 13:37:31 -0600
Message-ID: <3A8446FA.DCD17C7E@thebarn.com>
Date: Fri, 09 Feb 2001 13:37:30 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>
Cc: freebsd-chat@FreeBSD.ORG, Jack Rusher <jar@integratus.com>,
	Sam Leffler <sam@errno.com>, Zhiui Zhang <zzhang@cs.binghamton.edu>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <200102090856.BAA08304@usr08.primenet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Terry Lambert wrote:

> OK, this is not a license war.  I will lay it on the line.
>

Ok I did get a response from the lawyer...
as in typical lawyer talk he didn't give much
of a response either way, but I think
he is open to discussion.

Ok somebody from the BSD camp should an provide
an example of an acceptable license.
If I can present something other than abstract concept
more progress can be made.

The one major requirement is that somebody like Sun or
IBM can't pick up the code and start commercializing it.
And no I'm not saying restricting a commercial product
with XFS, but restricting somebody from making XFS
a commercial product unto itself.


--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 11:42: 7 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10])
	by hub.freebsd.org (Postfix) with ESMTP
	id D5B3437B6AE; Fri,  9 Feb 2001 11:41:21 -0800 (PST)
Received: from ledzep.americas.sgi.com (ledzep.americas.sgi.com [137.38.226.97]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id KAA04146; Fri, 9 Feb 2001 10:09:15 -0800 (PST)
	mail_from (cattelan@thebarn.com)
Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id MAA33083; Fri, 9 Feb 2001 12:10:15 -0600 (CST)
Received: from thebarn.com (localhost [127.0.0.1])
	by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f19I9E020316;
	Fri, 9 Feb 2001 12:09:14 -0600
Message-ID: <3A843249.D93D5952@thebarn.com>
Date: Fri, 09 Feb 2001 12:09:14 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>
Cc: freebsd-chat@FreeBSD.ORG, Jack Rusher <jar@integratus.com>,
	Sam Leffler <sam@errno.com>, Zhiui Zhang <zzhang@cs.binghamton.edu>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <200102090856.BAA08304@usr08.primenet.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Terry Lambert wrote:

> OK, this is not a license war.  I will lay it on the line.
>
> I am offering to do a preliminary port of the XFS code,
> potentially to the point of minimally a read-only mount, and
> perhaps much further, depending on the effort required.
>
> The resulting code will have some nasty strings, based on
> me assuming your comments are correct, and wanting some
> guarantees on that, on my part.  The strings go away when
> your claims to SGI's actions are met.
>
> Below is my reply to your message, including the philosophical
> basis for the strings, a description of the strings, and the
> details of my offer.
>
> This offer is good for a starting date before 01 March 2001.
>
> --
>
> > I'm not sure who you talked with?  but it really it that simple.
>
> Vijay.  The V.P. of Engineering at SGI who negotiated the
> release of the code.  I will quote one of his statements,
> made to me in email:
>
> ] Can't you just relicense FreeBSD [under the GPL]?
>
> > The reason the GPL was chosen for XFS.
> > It's the license Linux is using, and since
> > the port is being done for Linux it makes sense.
>
> One would think a dual license.  Alternately, one would think
> they would use the LGPL, which would let people link it into
> their kernels, as long as they gave source (which BSD does)
> or otherwise permitted relinking.
>
> In other words, the GPL is not really an optimal license, if
> they wanted wide use AND specific Linux license compatability.
> I concluded from their choice that they were not going for
> wide use, but instead wanted the marketing benefit of being
> associated with Linux (lots of press, etc.).
>
> > SGI is also doing work with the XFree code, the work
> > is being released under the X license (which is also
> > an anti GPL license).
>
> The BSD and MIT licenses predate the GPL, so careful with the
> word "anti" there...

Well yes... but my point was that it is a more open license and
the XFree projects has stated they want to keep it that way.

>
>
> > SGI is basically matching license for licensee to
> > whatever project they are contributing to.
> > This from the lawyer that is doing all the open source work.
>
> Rather than the use to which the software is put; that's a bit
> naieve, then, again.
>
> > I have stated this in the past but I will bring it up again.
> > If sufficient momentum can be generated toward an fbsd port
> > of XFS, it may be possible to go to the lawyers and have a another
> > license drawn up.
>
> If we had it in writing that the code would be released under
> a license usable by the BSD kernel, preferrably "matching license
> for license", as you state, then we would commit to do the work.
>
> The problem we have is that the code under the current license
> is useless to us, and unless we can be ensured that the code we
> write to glue it in won't end up also being useless to us, there
> is really no reason to commit the effort.
>
> > But unless the bsd community can show they are serious about
> > XFS being ported it would be a waste of time to ask for
> > something that SGI has very little business interesting in doing.
>
> So if we were to do a port, then SGI would have a business interest,
> and would relicense the code?  Can we have that in writing?
>
> > Note Darwin might be a big win in terms of making a business case
> > for another platform.
>
> Darwin support would be automatic, with a FreeBSD port.  Darwin
> can use FreeBSD FS code, unmodified.

Ohh? I got the impression the vm system is quite different.
vfs and vnode may map quite effortlessly but that's not the
part I'm concerned about.
95% of the work for linux port has been in the IO path.

>
>
> > Let get to the point were XFS is in such demand on fbsd
> > we can get a petition going if necessary to have the license
> > updated.
>
> Demand is very different; it is an aspect of marketing.  How
> much demand do you want, and where do you want it directed?  I
> believe that it would be a trivial exercise to generate as much
> demand as you require.

I need something to say "hey look" people really want to use
this.
The half a dozen or so emails I've gotten requesting isn't enough
to present to the lawyers say people really really want this.

I can't promise anything, but I will send a note to the lawyer
and see what kind of suggestion SGI would be open to.

Would the LGPL satisfy things?
This one might be the easiest to propose since it is close to the GPL
(something they already understand),
or provide an example of a license I can present.

>

This won't be an easy task, since the general attitude I will probably
encounter... why should we care, we're doing linux not bsd.
But I will try.


>
>
> > BTW if anybody is interested a few of us have started looking
> > at actually doing the port.  Not much has been done at this
> > point... basically battling through header file cleanup.
>
> If you have your head wrapped around it already, file system
> code is really very trivial, particularly if you have code that
> already works in one environment, and are merely porting it.
>
> I'll tell you what: give me a pointer to the code without the
> Linux modifications, so that I won't inadvertantly include code
> that is derived from GPL'ed code, and I will create a FreeBSD
> port of the code, with all code additions, which will compile
> and link successfully in a FreeBSD kernel, in a matter of a few
> days.  I will additionally require an image of an XFS FS on a
> floppy disk, which I can use for compatability testing.  There
> should be one file with an example of each thing the FS is
> capable of representing, including a directory, a directory
> with a subdirectory, a file, and a directory with two files;
> the files should be short, but if immediate files exist, one
> should be long enough to trigger indirection.  It would be most
> useful if the image were zero'ed before it was created, so I am
> able to distinguish XFS written data from "blank floppy" contents
> (and to aid compression of the image).

Hmm XFS can't run on a floppy; it's to small.
Adrian Chad is working on the user land stuff now.
once mkfs is running XFS can be written to a file
and by use of proto file the image can be pre populated.

>
> I will provide my code for FTP, which will be licensed to
> explicitly prohibit all but developement use, with a license
> which will transform itself to the three clause Berkeley
> license, if the XFS code which it's designed to work with
> is also released under a Berkeley-style license, and a release
> from patent claims in the covered code.
>
> In other words, the code I provide will be useless to everyone
> but FS researchers, unless the SGI license on the XFS components
> it must be linked with change to permit BSD to use the code as a
> boot FS, and further, permit commercial use by not hiding submerged
> patent infringement lawsuits which will be sprung on the unwary,
> as soon as someone with deep pockets uses the code.
>
> Call me distrustful, but I am fully capable of delivering in a
> very short time frame, so I'm pretty much the only game.
>
> > Ohh one other comment:
> > The only time SGI may ask for a copy write reassignment is if
> > the contributed code affects the filesystem compatibility
> > between irix and linux. This would have to be a major
> > contribution before something like this would be an issue, and
> > some negotiation will most certainly be involved.
>
> You're damn straight there will be: SGI will be begging the
> author to assign rights to a derivative work of SGI's own
> code.  If that author is philosophically adamant about the
> GPL, the assignment of rights will never happen, unless the
> author also lacks personal integrity, and SGI is willing to
> buy them out of their philosophical stubborness, or pay
> their own engineers to recreate the code.
>
> > Up to to this point all bug fixes have been linux related only
> > so it really isn't an issue.
>
> I maintain it probably never will be.  Ask Vijay for my
> arguments in this regard; they boil down to the level of
> effort and complexity involved in FS hacking.  It takes a
> professional, someone with academic rigor, to do useful work.
>
> Consider that the only minds capable of adding Soft Updates
> technology to XFS, without a huge capital expenditure, are
> existant _only_ in the BSD community.
>
> > This isn't SGI trying to be an ass... rather SGI trying to
> > provide the most compatible FS it can within the constrains
> > of many legal issues.
>
> A library style license of the Mozilla bent would have been
> able to accomplish this rather easily, without losing SGI
> rights to (putative) improvements, and without limiting the
> compatability of the license to nothing but Linux.  Linux
> could archive it and treat it as a statically linked library
> used by the kernel or a kernel module.
>
> The effect on BSD would have been to require it to do what
> it does already, and for systems vendors to provide an "ld -r"'ed
> kernel and XFS source code.  A pain in the ass, but livable
> for most commercial users and embedded systems vendors.
>
> I can't believe SGI's lawyers didn't know precisely what they
> were giving away, and what they weren't.

This Open Source thing is need to the closed world...
they are struggling to understand how to best protect themselves
yet work with the community.


>
>
> --
>
> So, are you going to point me at the pure (convertable to
> another license, since it contains only SGI contributions)
> SGI XFS code, and an image of a sample FS that I can write
> to a floppy for testing purposes?

I'll try to generate a tree from GPL release day 1. March 2000
Otherwise simply look at the CVS tree for the
tag GPL-ENCUMBRANCE it was put on all the
XFS code.

>
>
> Meanwhile, I think the FreeBSD community should continue to
> pursue their own JFS, under a useful license that could then
> trigger commercial support for the programming required...

Granted; But given the number of people SGI has doing the initial
XFS work and the 3 years it took them ust to get the FS off the ground.
I don't think we'll have anything real soon.

> That's how the BSD community gets professional programmers
> to do complex and unpleasent tasks, while other communities
> never get the unpleasent tasks (e.g. Soft Updates [Whistle/IBM],
> fully unified VM and buffer cache [Oracle], etc.) done at all,
> after all.  Marketing is a poor coin for getting long term
> work done; it's too ephemeral for a long term investment to
> be worthwhile.
>
>                                         Terry Lambert
>                                         terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 11:59:30 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from urban.iinet.net.au (urban.iinet.net.au [203.59.24.231])
	by hub.freebsd.org (Postfix) with ESMTP
	id 64C5937B69B; Fri,  9 Feb 2001 11:59:08 -0800 (PST)
Received: from muzak.iinet.net.au (muzak.iinet.net.au [203.59.24.237])
	by urban.iinet.net.au (8.8.7/8.8.7) with ESMTP id DAA31350;
	Sat, 10 Feb 2001 03:58:59 +0800
Received: from elischer.org (reggae-13-225.nv.iinet.net.au [203.59.79.225])
	by muzak.iinet.net.au (8.8.5/8.8.5) with ESMTP id DAA03325;
	Sat, 10 Feb 2001 03:56:25 +0800
Message-ID: <3A844BFF.D2C68053@elischer.org>
Date: Fri, 09 Feb 2001 11:58:55 -0800
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Russell Cattelan <cattelan@thebarn.com>
Cc: Terry Lambert <tlambert@primenet.com>, freebsd-chat@FreeBSD.ORG,
	Jack Rusher <jar@integratus.com>, Sam Leffler <sam@errno.com>,
	Zhiui Zhang <zzhang@cs.binghamton.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <200102090856.BAA08304@usr08.primenet.com> <3A843249.D93D5952@thebarn.com>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Russell Cattelan wrote:
> 
> Terry Lambert wrote:
> 
>
> > Darwin support would be automatic, with a FreeBSD port.  Darwin
> > can use FreeBSD FS code, unmodified.

Unmodified is a bit of a hyperlbolae..
Let's say "there's probably a close mapping due to common anscestors"

> 
> Ohh? I got the impression the vm system is quite different.
> vfs and vnode may map quite effortlessly but that's not the
> part I'm concerned about.
> 95% of the work for linux port has been in the IO path.

Remember that Darwin is based on Mach, and that FreeBSD is based on BSD4.4
which used the Mach VM, so we have a common anscestor in the VM systems too.


> 

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 12:20:48 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mass.dis.org (c228380-a.sfmissn1.sfba.home.com [24.20.90.44])
	by hub.freebsd.org (Postfix) with ESMTP
	id AA24537B6A2; Fri,  9 Feb 2001 12:20:28 -0800 (PST)
Received: from mass.dis.org (localhost [127.0.0.1])
	by mass.dis.org (8.11.1/8.11.1) with ESMTP id f19KMDH00585;
	Fri, 9 Feb 2001 12:22:14 -0800 (PST)
	(envelope-from msmith@mass.dis.org)
Message-Id: <200102092022.f19KMDH00585@mass.dis.org>
X-Mailer: exmh version 2.1.1 10/15/1999
To: karsten@rohrbach.de
Cc: hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning 
In-reply-to: Your message of "Fri, 09 Feb 2001 20:18:49 +0100."
             <20010209201849.B48420@rohrbach.de> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 09 Feb 2001 12:22:13 -0800
From: Mike Smith <msmith@freebsd.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
> when it comes to ibm, as far as i understand you have to hook up their
> filers to rs/6000(aix) or s/370 or s/390 systems since they are "only"
> fibrechannel or ficon attached raid subsystems, so the client platform
> is responsible for handling all the filesystem stuff.

Hrrm.  The last box I looked at included a pair of RS6000's in the 
cabinet, and they were touting it as a NAS, but I wasn't paying so much 
attention then.

> > Your friend may also want to look at Traakan, who have a novel product in 
> > this space.
> i checked out their website which says "under construction"
> strange...

Definitely; they had some neat stuff up there a week or two ago...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Feb  9 12:34:38 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from orthanc.ab.ca (207-167-15-66.dsl.worldgate.ca [207.167.15.66])
	by hub.freebsd.org (Postfix) with ESMTP
	id F03BE37B6A8; Fri,  9 Feb 2001 12:34:17 -0800 (PST)
Received: from orthanc.ab.ca (localhost [127.0.0.1])
	by orthanc.ab.ca (8.11.1/8.11.1) with ESMTP id f19KYGi01493;
	Fri, 9 Feb 2001 13:34:16 -0700 (MST)
	(envelope-from lyndon@orthanc.ab.ca)
Message-Id: <200102092034.f19KYGi01493@orthanc.ab.ca>
To: Mike Smith <msmith@FreeBSD.ORG>
Cc: karsten@rohrbach.de, hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: Extremely large (70TB) File system/server planning 
In-reply-to: Your message of "Fri, 09 Feb 2001 12:22:13 PST."
             <200102092022.f19KMDH00585@mass.dis.org> 
Date: Fri, 09 Feb 2001 13:34:16 -0700
From: Lyndon Nerenberg <lyndon@orthanc.ab.ca>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Another company to look at is Yottayotta (www.yottayotta.com).
They just announced their first products last November, and there
isn't much hard product info online yet. For the arena they're
targeting, though, 70TB would be an entry level system.

--lyndon


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message