From owner-freebsd-fs Mon Feb 12 11:39:15 2001 Delivered-To: freebsd-fs@freebsd.org Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by hub.freebsd.org (Postfix) with ESMTP id 0FDF337B491 for ; Mon, 12 Feb 2001 11:39:08 -0800 (PST) Received: from ledzep.americas.sgi.com (relay.cray.com [137.38.226.97]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id LAA08233; Mon, 12 Feb 2001 11:38:29 -0800 (PST) mail_from (cattelan@thebarn.com) Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id NAA70341; Mon, 12 Feb 2001 13:38:29 -0600 (CST) Received: from thebarn.com (localhost [127.0.0.1]) by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f1CJbP022623; Mon, 12 Feb 2001 13:37:29 -0600 Message-ID: <3A883B74.F1CAFAFE@thebarn.com> Date: Mon, 12 Feb 2001 13:37:24 -0600 From: Russell Cattelan X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686) X-Accept-Language: en MIME-Version: 1.0 To: Zhiui Zhang Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Zhiui Zhang wrote: > I guess that this will involve either memory copying or changing the > buffer header directly. Linux seems to address buffer directly via > physical (not logical) block number, so there is no need to change the > buffer header. Plus, Linux have a reference count to prevent a buffer from > disappearing (brelse()'ed). Yes this is true. > > > Another difficulty is that if several transactions are in progress at the > same time, we must remember which metadata buffers are modified by which > transactions. When we copy/rename the buffer, we must inform those > transactions the fact that we did the copy/rename. The buffers modified > by one transaction must be flushed at the same time. Hmm I'm not sure what the problem is here. A transaction log entry will log all changes necessary to complete that transaction, even if it involves multiple meta data objects, which is almost always does. In the event of a crash and subsequent replay of the log: the recovery code will make sure all the meta data on the disk is consistent with the log. If one meta data write happened but the another one didn't the recovery code only updates the one that didn't complete. What is the size of the disk block container on bsd buf_t's ? if they are 64bit we shouldn't have a problem... simply use absolution disk addressing for meta data items. Why would we need to copy a meta data buf_t? > > BTW, Linux GFS code seems to allow ONE transaction in progess at any time. > > -Zhihui > > On Fri, 9 Feb 2001, Russell Cattelan wrote: > > > Zhiui Zhang wrote: > > > > > I am considering the design of a journalled file system in FreeBSD. I > > > think each transaction corresponds to a file system update operation and > > > will therefore consists of a list of modified buffers. The important > > > thing is that these buffers should not be written to disk until they have > > > been logged into the log area. To do so, we need to pin these buffers in > > > memory for a while. The concept should be simple, but I run into a problem > > > which I have no idea how to solve it: > > > > > > If you access a lot of files quickly, some vnodes will be reused. These > > > vnodes can contain buffers that are still pinned in the memory because of > > > the write-ahead logging constraints. After a vnode is gone, we have > > > no way to recover its buffers. Note that whenever we need a new vnode, we > > > are in the process of creating a new file. At this point, we can not flush > > > the buffers to the log area. The result is a deadlock. > > > > XFS: > > All pinned buffers are keep on a queue to be flushed by a > > daemon that walks the queue looking for buffer that > > have recently become unlocked and unpinned. > > > > > > > > > > > > > I could make copies of the buffers that are still pinned, but that incurs > > > memory copy and need buffer headers, which is also a rare resource. > > > > > > The design is similar to ext3fs of linux (they do not seem to have a vnode > > > layer and they use device + physical block number instead of vnode + > > > logical block number to index buffers, which, I guess, means that buffers > > > can exist after the inode is gone). I know Mckusick has a paper on > > > > Yup. All meta data buffer use and absolute device offset. > > > > > > > journalling FFS, but I just want to know if this design can work or not. > > > > > > Any ideas? Thanks for your help! > > > > > > -Zhihui > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > > with "unsubscribe freebsd-fs" in the body of the message > > > > -- > > Russell Cattelan > > cattelan@thebarn.com > > > > > > > > -- Russell Cattelan -- Digital Elves inc. -- Currently on loan to SGI Linux XFS core developer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 12:35: 5 2001 Delivered-To: freebsd-fs@freebsd.org Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18]) by hub.freebsd.org (Postfix) with ESMTP id EA57D37B491 for ; Mon, 12 Feb 2001 12:35:01 -0800 (PST) Received: from opal (cs.binghamton.edu [128.226.123.101]) by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f1CKYsG11922; Mon, 12 Feb 2001 15:34:54 -0500 (EST) Date: Mon, 12 Feb 2001 15:34:54 -0500 (EST) From: Zhiui Zhang X-Sender: zzhang@opal To: Russell Cattelan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system In-Reply-To: <3A883B74.F1CAFAFE@thebarn.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 12 Feb 2001, Russell Cattelan wrote: > > Another difficulty is that if several transactions are in progress at the > > same time, we must remember which metadata buffers are modified by which > > transactions. When we copy/rename the buffer, we must inform those > > transactions the fact that we did the copy/rename. The buffers modified > > by one transaction must be flushed at the same time. Thanks for your reply. I mean if a transaction locks down all the metadata (e.g., bitmap blocks) it modified until it commits, then there is no problem (but this reduces concurrency). Otherwise, the same metadata blocks can contain modifications done by more than one transaction. I do not know how XFS solves this problem. Since XFS uses B+ tree, I guess that locking can be done in a hierarchy way easily to avoid deadlock. But in FFS, the bitmap blocks has no relationship with each other. Locking the bitmap blocks in FFS in arbitrary order can cause deadlock, I guess. IBM JFS seems to use incore log implemented as page cache. XFS has pagebuf. I expect that is something similar to IBM's page cache. > Hmm I'm not sure what the problem is here. > A transaction log entry will log all changes necessary to complete > that transaction, even if it involves multiple meta data objects, which is > almost always does. > In the event of a crash and subsequent replay of the log: the recovery code > will make sure all the meta data on the disk is consistent with the log. > If one meta data write happened but the another one didn't the recovery > code only updates the one that didn't complete. > > What is the size of the disk block container on bsd buf_t's ? > if they are 64bit we shouldn't have a problem... simply use absolution disk > addressing for meta data items. > Why would we need to copy a meta data buf_t? > In sys/buf.h of FreeBSD, it has: daddr_t b_lblkno; /* Logical block number. */ daddr_t b_blkno; /* Underlying physical block number. */ Both are 32-bit integer. I am not sure why it is not 64-bit. Maybe it has something to do with merged buffer cache. -Zhihui To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 12:51:55 2001 Delivered-To: freebsd-fs@freebsd.org Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10]) by hub.freebsd.org (Postfix) with ESMTP id B4A2D37B491 for ; Mon, 12 Feb 2001 12:51:50 -0800 (PST) Received: from ledzep.americas.sgi.com (ledzep.americas.sgi.com [137.38.226.97]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id MAA03252; Mon, 12 Feb 2001 12:50:26 -0800 (PST) mail_from (cattelan@thebarn.com) Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id OAA82176; Mon, 12 Feb 2001 14:51:27 -0600 (CST) Received: from thebarn.com (localhost [127.0.0.1]) by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f1CKoR029400; Mon, 12 Feb 2001 14:50:27 -0600 Message-ID: <3A884C91.56037FFE@thebarn.com> Date: Mon, 12 Feb 2001 14:50:26 -0600 From: Russell Cattelan X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686) X-Accept-Language: en MIME-Version: 1.0 To: Zhiui Zhang Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Zhiui Zhang wrote: > On Mon, 12 Feb 2001, Russell Cattelan wrote: > > > > Another difficulty is that if several transactions are in progress at the > > > same time, we must remember which metadata buffers are modified by which > > > transactions. When we copy/rename the buffer, we must inform those > > > transactions the fact that we did the copy/rename. The buffers modified > > > by one transaction must be flushed at the same time. > > Thanks for your reply. I mean if a transaction locks down all the metadata > (e.g., bitmap blocks) it modified until it commits, then there is no > problem (but this reduces concurrency). Otherwise, the same metadata > blocks can contain modifications done by more than one transaction. This really isn't a problem... meta data buffers have to be "pinned" but not necessarily locked. A meta data buffer can be modified many times without having to be written out to disk, take for example the super block, this will get flushed out to disk occasionally but since it is being modified so often most changes never get flushed. A log of each of those changes will be in every transaction that touched the super block, but the super block doesn't have to be written out every time. The primary goal is to have a consistent file system not to be able to rollback every change that happens. > I do > not know how XFS solves this problem. Since XFS uses B+ tree, I guess > that locking can be done in a hierarchy way easily to avoid deadlock. > But in FFS, the bitmap blocks has no relationship with each other. Locking > the bitmap blocks in FFS in arbitrary order can cause deadlock, I guess. > > IBM JFS seems to use incore log implemented as page cache. XFS has > pagebuf. I expect that is something similar to IBM's page cache. > > > Hmm I'm not sure what the problem is here. > > A transaction log entry will log all changes necessary to complete > > that transaction, even if it involves multiple meta data objects, which is > > almost always does. > > In the event of a crash and subsequent replay of the log: the recovery code > > will make sure all the meta data on the disk is consistent with the log. > > If one meta data write happened but the another one didn't the recovery > > code only updates the one that didn't complete. > > > > What is the size of the disk block container on bsd buf_t's ? > > if they are 64bit we shouldn't have a problem... simply use absolution disk > > addressing for meta data items. > > Why would we need to copy a meta data buf_t? > > > > In sys/buf.h of FreeBSD, it has: > > daddr_t b_lblkno; /* Logical block number. */ > daddr_t b_blkno; /* Underlying physical block number. */ > > Both are 32-bit integer. I am not sure why it is not 64-bit. Maybe it has > something to do with merged buffer cache. Ok good so we have a spot to store the absolute block number... good. Assuming these are in units of 512 this will work up until 2TB. Linux has the same 2TB limit problem right now... > > -Zhihui -- Russell Cattelan -- Digital Elves inc. -- Currently on loan to SGI Linux XFS core developer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 13:21:56 2001 Delivered-To: freebsd-fs@freebsd.org Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18]) by hub.freebsd.org (Postfix) with ESMTP id 5952737B4EC for ; Mon, 12 Feb 2001 13:21:35 -0800 (PST) Received: from opal (cs.binghamton.edu [128.226.123.101]) by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f1CLLXu12978; Mon, 12 Feb 2001 16:21:33 -0500 (EST) Date: Mon, 12 Feb 2001 16:21:33 -0500 (EST) From: Zhiui Zhang X-Sender: zzhang@opal To: Russell Cattelan Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system In-Reply-To: <3A884C91.56037FFE@thebarn.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 12 Feb 2001, Russell Cattelan wrote: > Zhiui Zhang wrote: > > > On Mon, 12 Feb 2001, Russell Cattelan wrote: > > > > > > Another difficulty is that if several transactions are in progress at the > > > > same time, we must remember which metadata buffers are modified by which > > > > transactions. When we copy/rename the buffer, we must inform those > > > > transactions the fact that we did the copy/rename. The buffers modified > > > > by one transaction must be flushed at the same time. > > > > Thanks for your reply. I mean if a transaction locks down all the metadata > > (e.g., bitmap blocks) it modified until it commits, then there is no > > problem (but this reduces concurrency). Otherwise, the same metadata > > blocks can contain modifications done by more than one transaction. > > This really isn't a problem... meta data buffers have to be "pinned" but not > necessarily locked. A meta data buffer can be modified many times without > having to be written out to disk, take for example the super block, this will > get flushed out to disk occasionally but since it is being modified so often > most changes never get flushed. A log of each of those changes will > be in every transaction that touched the super block, but the super > block doesn't have to be written out every time. > The primary goal is to have a consistent file system not to be able > to rollback every change that happens. > It seems to me that I have failed to explain my point again. So an example may help. Suppose I have a bitmap block buffer. One transaction allocate some blocks from it, the other transaction free some blocks into it. If the bitmap block buffer is not locked for the duration of a transaction, then it could contain modifications made both transactions. The atomicity is violated unless you can make the two transactions merge into one later. On the other hand, if it is locked for a transaction and that transaction blocks for some other I/O, then performance will suffer (no one can use the bitmap block buffer for a while). -Zhihui To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 15: 7: 0 2001 Delivered-To: freebsd-fs@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 0F41737B491 for ; Mon, 12 Feb 2001 15:06:56 -0800 (PST) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id QAA46130; Mon, 12 Feb 2001 16:06:22 -0700 Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp10.phx.gblx.net, id smtpdA34BEa; Mon Feb 12 16:06:18 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id QAA11325; Mon, 12 Feb 2001 16:06:44 -0700 (MST) From: Terry Lambert Message-Id: <200102122306.QAA11325@usr08.primenet.com> Subject: Re: Design a journalled file system To: zzhang@cs.binghamton.edu (Zhiui Zhang) Date: Mon, 12 Feb 2001 23:06:44 +0000 (GMT) Cc: cattelan@thebarn.com (Russell Cattelan), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Zhiui Zhang" at Feb 12, 2001 04:21:33 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > It seems to me that I have failed to explain my point again. So an example > may help. Suppose I have a bitmap block buffer. One transaction allocate > some blocks from it, the other transaction free some blocks into it. If > the bitmap block buffer is not locked for the duration of a transaction, > then it could contain modifications made both transactions. The atomicity > is violated unless you can make the two transactions merge into one later. > On the other hand, if it is locked for a transaction and that transaction > blocks for some other I/O, then performance will suffer (no one can use > the bitmap block buffer for a while). Russell is right, for XFS, and for most Journalled FS's, where the validity marking on the journal entry (as being the most recent) is the most important thing. All transactions are written as if by way of a write-through cache of the modification data. In other words, in his world, there's no such conflict between concurrent operations. Per a previous post, Soft Updates is all about "unless you can make the two transactions merge into one later". Specifically, if you have a disk block, it's 512b. An inode on disk is 128b. This means 4 inodes per block. Similarly, a directory entry block is 512b. A given block will contain between 1 and 16 directory entries, each of which may be in the process of being manipulated. And so on. Soft Updates keeps a list of modifications to conflicted blocks, in core, and actually makes a copy of the conflicted block, and backs out transaction state, when committing partial transactions. It does this by maintaining a state conflict domain dependency list (which is why Soft Updates are sometimes called Soft Dependencies instead). Practically, for a design, you can generally reduce the domains of conflict by increasing your object sizes to 512b. This lets you have things like ACL and immediate file support in inodes, which you can then bill as a feature. For the directory entry blocks, the conflict is already somewhat mitigated by the fact that anyone iterating the directory, you make a copy of the block -- it is a snapshot, not the actual directory contents you are iterating. The NFS "cookie" code for iteration restart is really a kludge; it could have just as easily worked around the difference between on disk and wire and user space directory entries within a given block, by seperating the code into a "copy FS sided unit into snapshot" and "copy data from snapshot into representation buffer" VOPs (I've suggested this many times, and provided the code several). The bottom line is that bitmaps only matter if you implement using bitmaps. For inescapable conflicts (like the "last modified" or "time of last update" in superblock data, which you must have for recovery following a crash, the easiest method to work around the problem is to log superblocks as well, and then iterate to the "most recent valid", during recovery. Ideally, you probably _do_ want to incorporate Soft Updates technology, since it lets you avoid artificial stalls when you enter into an unavoidable conflict (XFS stalls and drains at those points), but it's not immediately necessary (just don't design against it as a future optimization). I really, really urge FS designers to go back to first principles when examining problems, and to consider FSs as transactions to be applied to persistant state data as a result of events. If you do that, then protecting the integrity of the persistant state becomes obvious and easy. Actually, this really brings home the license point for XFS, since it should be obvious that it could benefit from soft updates, which it won't get without paying something (like access to its sources in a useful fashion for the BSD community). Yes, I'm still looking for a commercial license that prohibits making XFS a stand-alone product, but still allows it be used in a commercial setting. The Sun License on the original SLPv1, but fails to grant in perpetuity. It may be that SGIs lawyer will have to do lawyering to work out one that satisfies them. Hopefully SGI will learn the HP JetSend and the Sun JINI and the Net/1 & Net/2 TCP/IP lesson: if you want something to be standard, you can't control it, and if you control it, it won't be standard. Note: my March 1st offer stands. I have yet to hear how to get the unencumbered (SGI-only) GPL code... the clock's ticking. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 16:28:21 2001 Delivered-To: freebsd-fs@freebsd.org Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18]) by hub.freebsd.org (Postfix) with ESMTP id DB2AA37B491 for ; Mon, 12 Feb 2001 16:28:15 -0800 (PST) Received: from onyx (onyx.cs.binghamton.edu [128.226.140.171]) by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f1D0S7u15360; Mon, 12 Feb 2001 19:28:07 -0500 (EST) Date: Mon, 12 Feb 2001 19:28:06 -0500 (EST) From: Zhiui Zhang X-Sender: zzhang@onyx To: Terry Lambert Cc: Russell Cattelan , freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system In-Reply-To: <200102122306.QAA11325@usr08.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 12 Feb 2001, Terry Lambert wrote: > > It seems to me that I have failed to explain my point again. So an example > > may help. Suppose I have a bitmap block buffer. One transaction allocate > > some blocks from it, the other transaction free some blocks into it. If > > the bitmap block buffer is not locked for the duration of a transaction, > > then it could contain modifications made both transactions. The atomicity > > is violated unless you can make the two transactions merge into one later. > > On the other hand, if it is locked for a transaction and that transaction > > blocks for some other I/O, then performance will suffer (no one can use > > the bitmap block buffer for a while). > > Russell is right, for XFS, and for most Journalled FS's, where the > validity marking on the journal entry (as being the most recent) > is the most important thing. All transactions are written as if > by way of a write-through cache of the modification data. > > In other words, in his world, there's no such conflict between > concurrent operations. I think I got better understanding this time. Each transaction's log entry only log changes *made by itself* using logical logging (instead of physical logging. In physical logging, the entire bitmap block will be logged, potentially including modifications made by others). From time to time, the filesystem will force a sync operation that write the metadata in-place to free log space. All transactions must be finished to reach such a sync point. IBM JFS seems to do logging this way. -Zhihui > Per a previous post, Soft Updates is all about "unless you can > make the two transactions merge into one later". > > Specifically, if you have a disk block, it's 512b. An inode on > disk is 128b. This means 4 inodes per block. > > Similarly, a directory entry block is 512b. A given block will > contain between 1 and 16 directory entries, each of which may > be in the process of being manipulated. > > And so on. > > Soft Updates keeps a list of modifications to conflicted blocks, > in core, and actually makes a copy of the conflicted block, and > backs out transaction state, when committing partial transactions. > It does this by maintaining a state conflict domain dependency > list (which is why Soft Updates are sometimes called Soft > Dependencies instead). > > > Practically, for a design, you can generally reduce the domains > of conflict by increasing your object sizes to 512b. This lets > you have things like ACL and immediate file support in inodes, > which you can then bill as a feature. > > For the directory entry blocks, the conflict is already somewhat > mitigated by the fact that anyone iterating the directory, you > make a copy of the block -- it is a snapshot, not the actual > directory contents you are iterating. The NFS "cookie" code > for iteration restart is really a kludge; it could have just as > easily worked around the difference between on disk and wire and > user space directory entries within a given block, by seperating > the code into a "copy FS sided unit into snapshot" and "copy data > from snapshot into representation buffer" VOPs (I've suggested > this many times, and provided the code several). > > The bottom line is that bitmaps only matter if you implement > using bitmaps. For inescapable conflicts (like the "last > modified" or "time of last update" in superblock data, which > you must have for recovery following a crash, the easiest method > to work around the problem is to log superblocks as well, and > then iterate to the "most recent valid", during recovery. > > Ideally, you probably _do_ want to incorporate Soft Updates > technology, since it lets you avoid artificial stalls when you > enter into an unavoidable conflict (XFS stalls and drains at > those points), but it's not immediately necessary (just don't > design against it as a future optimization). > > I really, really urge FS designers to go back to first principles > when examining problems, and to consider FSs as transactions to > be applied to persistant state data as a result of events. If > you do that, then protecting the integrity of the persistant > state becomes obvious and easy. > > > Actually, this really brings home the license point for XFS, > since it should be obvious that it could benefit from soft > updates, which it won't get without paying something (like > access to its sources in a useful fashion for the BSD community). > > Yes, I'm still looking for a commercial license that prohibits > making XFS a stand-alone product, but still allows it be used > in a commercial setting. The Sun License on the original SLPv1, > but fails to grant in perpetuity. It may be that SGIs lawyer > will have to do lawyering to work out one that satisfies them. > > Hopefully SGI will learn the HP JetSend and the Sun JINI and the > Net/1 & Net/2 TCP/IP lesson: if you want something to be standard, > you can't control it, and if you control it, it won't be standard. > > Note: my March 1st offer stands. I have yet to hear how to get > the unencumbered (SGI-only) GPL code... the clock's ticking. > > > Terry Lambert > terry@lambert.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Feb 12 16:50:59 2001 Delivered-To: freebsd-fs@freebsd.org Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by hub.freebsd.org (Postfix) with ESMTP id 6CBD537B491 for ; Mon, 12 Feb 2001 16:50:43 -0800 (PST) Received: from ledzep.americas.sgi.com (relay.cray.com [137.38.226.97]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id QAA09104; Mon, 12 Feb 2001 16:50:41 -0800 (PST) mail_from (cattelan@thebarn.com) Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id SAA00108; Mon, 12 Feb 2001 18:50:40 -0600 (CST) Received: from thebarn.com (localhost [127.0.0.1]) by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f1D0ne030325; Mon, 12 Feb 2001 18:49:40 -0600 Message-ID: <3A8884A3.FEDB15FE@thebarn.com> Date: Mon, 12 Feb 2001 18:49:40 -0600 From: Russell Cattelan X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686) X-Accept-Language: en MIME-Version: 1.0 To: Zhiui Zhang Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Zhiui Zhang wrote: > On Mon, 12 Feb 2001, Terry Lambert wrote: > > > > It seems to me that I have failed to explain my point again. So an example > > > may help. Suppose I have a bitmap block buffer. One transaction allocate > > > some blocks from it, the other transaction free some blocks into it. If > > > the bitmap block buffer is not locked for the duration of a transaction, > > > then it could contain modifications made both transactions. The atomicity > > > is violated unless you can make the two transactions merge into one later. > > > On the other hand, if it is locked for a transaction and that transaction > > > blocks for some other I/O, then performance will suffer (no one can use > > > the bitmap block buffer for a while). > > > > Russell is right, for XFS, and for most Journalled FS's, where the > > validity marking on the journal entry (as being the most recent) > > is the most important thing. All transactions are written as if > > by way of a write-through cache of the modification data. > > > > In other words, in his world, there's no such conflict between > > concurrent operations. > > I think I got better understanding this time. Each transaction's log entry > only log changes *made by itself* using logical logging (instead of > physical logging. In physical logging, the entire bitmap block will be > logged, potentially including modifications made by others). From time to > time, the filesystem will force a sync operation that write the metadata > in-place to free log space. All transactions must be finished to reach > such a sync point. IBM JFS seems to do logging this way. Yes... From what I understand of JFS it does logging very similarly to XFS. Note: In addition to periodic FS syncing log space and subsequently meta data can be pushed out to disk during high transaction activity, something called "tail pushing". XFS logging has been designed and optimized for speed. The log will guarantee the FS is consistent after a crash, it does not guarantee no data/transactions have been lost. Synchronous transactions can be turned on to decrease the amount of lost information but it significantly hurts performance. 99% of the time the log is just an expensive slow overhead, async logging reduces the performance hit but does allow for transactions to be lost. Always a compromise :-) > > > -Zhihui > > > Per a previous post, Soft Updates is all about "unless you can > > make the two transactions merge into one later". > > > > Specifically, if you have a disk block, it's 512b. An inode on > > disk is 128b. This means 4 inodes per block. > > > > Similarly, a directory entry block is 512b. A given block will > > contain between 1 and 16 directory entries, each of which may > > be in the process of being manipulated. > > > > And so on. > > > > Soft Updates keeps a list of modifications to conflicted blocks, > > in core, and actually makes a copy of the conflicted block, and > > backs out transaction state, when committing partial transactions. > > It does this by maintaining a state conflict domain dependency > > list (which is why Soft Updates are sometimes called Soft > > Dependencies instead). > > > > > > Practically, for a design, you can generally reduce the domains > > of conflict by increasing your object sizes to 512b. This lets > > you have things like ACL and immediate file support in inodes, > > which you can then bill as a feature. > > > > For the directory entry blocks, the conflict is already somewhat > > mitigated by the fact that anyone iterating the directory, you > > make a copy of the block -- it is a snapshot, not the actual > > directory contents you are iterating. The NFS "cookie" code > > for iteration restart is really a kludge; it could have just as > > easily worked around the difference between on disk and wire and > > user space directory entries within a given block, by seperating > > the code into a "copy FS sided unit into snapshot" and "copy data > > from snapshot into representation buffer" VOPs (I've suggested > > this many times, and provided the code several). > > > > The bottom line is that bitmaps only matter if you implement > > using bitmaps. For inescapable conflicts (like the "last > > modified" or "time of last update" in superblock data, which > > you must have for recovery following a crash, the easiest method > > to work around the problem is to log superblocks as well, and > > then iterate to the "most recent valid", during recovery. > > > > Ideally, you probably _do_ want to incorporate Soft Updates > > technology, since it lets you avoid artificial stalls when you > > enter into an unavoidable conflict (XFS stalls and drains at > > those points), but it's not immediately necessary (just don't > > design against it as a future optimization). > > > > I really, really urge FS designers to go back to first principles > > when examining problems, and to consider FSs as transactions to > > be applied to persistant state data as a result of events. If > > you do that, then protecting the integrity of the persistant > > state becomes obvious and easy. > > > > > > Actually, this really brings home the license point for XFS, > > since it should be obvious that it could benefit from soft > > updates, which it won't get without paying something (like > > access to its sources in a useful fashion for the BSD community). > > > > Yes, I'm still looking for a commercial license that prohibits > > making XFS a stand-alone product, but still allows it be used > > in a commercial setting. The Sun License on the original SLPv1, > > but fails to grant in perpetuity. It may be that SGIs lawyer > > will have to do lawyering to work out one that satisfies them. > > > > Hopefully SGI will learn the HP JetSend and the Sun JINI and the > > Net/1 & Net/2 TCP/IP lesson: if you want something to be standard, > > you can't control it, and if you control it, it won't be standard. > > > > Note: my March 1st offer stands. I have yet to hear how to get > > the unencumbered (SGI-only) GPL code... the clock's ticking. > > > > > > Terry Lambert > > terry@lambert.org -- Russell Cattelan -- Digital Elves inc. -- Currently on loan to SGI Linux XFS core developer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Feb 13 2:10:49 2001 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (node16292.a2000.nl [24.132.98.146]) by hub.freebsd.org (Postfix) with ESMTP id D0D7F37B491 for ; Tue, 13 Feb 2001 02:10:45 -0800 (PST) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.1/8.11.1) id f1DA7Yk11503; Tue, 13 Feb 2001 11:07:34 +0100 (CET) (envelope-from adrian) Date: Tue, 13 Feb 2001 11:07:34 +0100 From: Adrian Chadd To: Terry Lambert Cc: cattelan@thebarn.com, freebsd-fs@freebsd.org Subject: Re: Design a journalled file system Message-ID: <20010213110734.A11487@roaming.cacheboy.net> References: <200102122306.QAA11325@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200102122306.QAA11325@usr08.primenet.com>; from tlambert@primenet.com on Mon, Feb 12, 2001 at 11:06:44PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, Feb 12, 2001, Terry Lambert wrote: [snip] > Note: my March 1st offer stands. I have yet to hear how to get > the unencumbered (SGI-only) GPL code... the clock's ticking. And I, as the other standing XFS-freebsd hacker, would also appreciate this if its possible. Adrian -- Adrian Chadd "Programming is like sex: One mistake and you have to support for a lifetime." -- rec.humor.funny To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Feb 13 7:52:54 2001 Delivered-To: freebsd-fs@freebsd.org Received: from vega.dmnshq.net (vega.dmnshq.net [194.19.34.94]) by hub.freebsd.org (Postfix) with ESMTP id 84F6937B491; Tue, 13 Feb 2001 07:52:44 -0800 (PST) Received: (from eivind@localhost) by vega.dmnshq.net (8.11.1/8.9.3) id f1DFpxP76498; Tue, 13 Feb 2001 16:51:59 +0100 (CET) (envelope-from eivind) Date: Tue, 13 Feb 2001 16:51:59 +0100 From: Eivind Eklund To: Terry Lambert Cc: Boris Popov , freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: vnode interlock API Message-ID: <20010213165159.A76093@thinksec.com> References: <200102072126.OAA24284@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200102072126.OAA24284@usr08.primenet.com>; from tlambert@primenet.com on Wed, Feb 07, 2001 at 09:26:00PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Feb 07, 2001 at 09:26:00PM +0000, Terry Lambert wrote: > > So, I suggest to introduce two macro definitions which will hide > > implementation details for interlocks: > > > > #define VI_LOCK(vp) mtx_enter(&(vp)->v_interlock, MTX_DEF) > > #define VI_UNLOCK(vp) mtx_exit(&(vp)->v_interlock, MTX_DEF) > > > > for RELENG_4 they will look like this: > > > > #define VI_LOCK(vp) simple_lock(&(vp)->v_interlock) > > #define VI_UNLOCK(vp) simple_unlock(&(vp)->v_interlock) > > > > Any comments, suggestions ? > > 4) You need to wrap the calls with "{ ... }"; this is because > it may be useful in the future to institute turnstile or > single wakeup semantics, and converting the macro into a > single statement instead of a statement block would mean > a potentially large amount of work would be needed to cope > with the change later, whereas, you seem to plan to already > need to touch all those spots now. This is not an issue. You can get a block that behaves as a single statement by doing do { ... } while (0), and this is the recommended way of writing blocks in macros (so the macros behaves like single statements instead of blocks.) Please do NOT introduce pure statement block wrapped macros. They make for strange semantics, and we are trying to get rid of them. Thanks. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Feb 13 23:39:19 2001 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id DA0EE37B491; Tue, 13 Feb 2001 23:39:15 -0800 (PST) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id AAA27779; Wed, 14 Feb 2001 00:36:12 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpdAAAn1a4m2; Wed Feb 14 00:36:04 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id AAA20903; Wed, 14 Feb 2001 00:38:54 -0700 (MST) From: Terry Lambert Message-Id: <200102140738.AAA20903@usr08.primenet.com> Subject: Re: vnode interlock API To: eivind@FreeBSD.ORG (Eivind Eklund) Date: Wed, 14 Feb 2001 07:38:23 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), bp@butya.kz (Boris Popov), freebsd-arch@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG In-Reply-To: <20010213165159.A76093@thinksec.com> from "Eivind Eklund" at Feb 13, 2001 04:51:59 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > This is not an issue. You can get a block that behaves as a single statement > by doing do { ... } while (0), and this is the recommended way of writing > blocks in macros (so the macros behaves like single statements instead of > blocks.) > > Please do NOT introduce pure statement block wrapped macros. They make for > strange semantics, and we are trying to get rid of them. The point was to block them; I personally prefer the "{ ... } while(0)" semantics, but the important thing is to allow the use of multiple statements, of course, so your correction is welcome, but it's the ability to add statements later that's important here. This wasn't being done. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Feb 15 13:53:25 2001 Delivered-To: freebsd-fs@freebsd.org Received: from pneumatic-tube.sgi.com (pneumatic-tube.sgi.com [204.94.214.22]) by hub.freebsd.org (Postfix) with ESMTP id 9653137B401 for ; Thu, 15 Feb 2001 13:53:20 -0800 (PST) Received: from ledzep.americas.sgi.com (ledzep.americas.sgi.com [137.38.226.97]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id OAA01775; Thu, 15 Feb 2001 14:01:19 -0800 (PST) mail_from (cattelan@thebarn.com) Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id PAA90492; Thu, 15 Feb 2001 15:51:55 -0600 (CST) Received: from thebarn.com (localhost [127.0.0.1]) by gibble.americas.sgi.com (8.11.2/8.11.2) with ESMTP id f1FLote10994; Thu, 15 Feb 2001 21:50:55 GMT Message-ID: <3A8C4F3E.43F4653C@thebarn.com> Date: Thu, 15 Feb 2001 21:50:54 +0000 From: Russell Cattelan X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686) X-Accept-Language: en MIME-Version: 1.0 To: Terry Lambert Cc: Zhiui Zhang , freebsd-fs@FreeBSD.ORG Subject: Re: Design a journalled file system References: <200102122306.QAA11325@usr08.primenet.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert wrote: > > Yes, I'm still looking for a commercial license that prohibits > making XFS a stand-alone product, but still allows it be used > in a commercial setting. The Sun License on the original SLPv1, > but fails to grant in perpetuity. It may be that SGIs lawyer > will have to do lawyering to work out one that satisfies them. I've talked a bit more with some the upper management here at SGI. Nothing concrete but the basic feeling: if a license can be found that would satisfy the Free aspect of the BSD community without giving permission to direct competitors ( e.g. Sun ) the ability build and ship a product would then compete with IRIX boxes. I suspect the easiest thing is going to be a license that removes the viral aspect of the GPL but still requires permission from SGI for any commercial application. > > > Hopefully SGI will learn the HP JetSend and the Sun JINI and the > Net/1 & Net/2 TCP/IP lesson: if you want something to be standard, > you can't control it, and if you control it, it won't be standard. > It's going to take time.... opening up XFS was a big step, I suspect most managers in the company don't fully understand what has happened for what could happen. Slow chipping at the old ideas.... > > > > Note: my March 1st offer stands. I have yet to hear how to get > the unencumbered (SGI-only) GPL code... the clock's ticking. > > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. -- Russell Cattelan -- Digital Elves inc. -- Currently on loan to SGI Linux XFS core developer. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Feb 15 16:56:58 2001 Delivered-To: freebsd-fs@freebsd.org Received: from torgut.com (torgut.com [207.159.140.87]) by hub.freebsd.org (Postfix) with ESMTP id A21CE37B698; Thu, 15 Feb 2001 16:56:34 -0800 (PST) Received: from aks011 (host-216-77-209-212.fll.bellsouth.net [216.77.209.212]) by torgut.com (8.9.3/8.9.3) with SMTP id AAA15561; Fri, 16 Feb 2001 00:56:06 GMT Date: Fri, 16 Feb 2001 00:56:06 GMT From: Youthful21@costa.de Message-Id: <200102160056.AAA15561@torgut.com> To: Youthful21@costa.de Subject: REVERSE the AGING PROCESS 10 - 20 Years! MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org HAVE YOU HEARD OF HUMAN GROWTH HORMONE (HGH)??? Released by your own pituitary gland, HGH starts declining in your 20s, even more in your 30s and 40s, eventually resulting in the shrinkage of major organs-plus all other symptoms related to old age. THIS CAN NOW BE REVERSED!!! IN THOUSANDS OF CLINICAL STUDIES, HGH HAS BEEN SHOWN TO ACCOMPLISH THE FOLLOWING: * Reduce Body Fat Without Dieting Build Lean Muscle WITHOUT EXERCISE! * Enhance Sexual Performance * Remove Wrinkles and Cellulite * Lower Blood Pressure and improve Cholesterol Profile * Improve Sleep, Vision and Memory * Restore Hair Color and Growth * Strengthen the Immune System * Increase Energy and Cardiac Output * Turn back your body's Biological Time Clock 10-20 years in 6 months of usage !!! You don't have to spend thousands of dollars on shots. You don't have to spend the $139.00 per bottle that HGH is selling for at some Clinics in the United States. For the next 30 Days, you can obtain a complete one-month supply of our HGH releaser for our special "New Customers" price of just $69.95 plus $6.00 shipping and handling. To ensure a constant supply and to SAVE EVEN MORE, you can order with confidence 3 bottles of HGH and GET 1 FREE - that's just $209.85 for 4 bottles, plus $6.00 shipping and handling. You SAVE $69.95! ORDER TODAY! Payment Methods You may FAX or Postal Mail Checks, MasterCard, Visa, & American Express payments. Money Orders are accepted only by Postal Mail. Step 1: Place a check by your desired quanity. ______ 1 Bottle of HGH $69.95 ______ 2 Bottles of HGH $131.90 ($65.95 a bottle) ______ 4 Bottles of HGH (Buy 3 get 1 FREE. SAVE $69.95) $209.85 Please add $6 shipping and handling for any size order. [ Total cost including shipping & handling, 1 bottle=$75.95, 2 bottles=$137.90, 4 bottles=$215.85 ] International shipping, please add $35 for any size order [ Total cost including shipping & handling, 1 bottle=$104.95, 2 bottles=$166.90, 4 bottles=$244.85 ] Foreign checks are not accepted. Credit cards & international money orders only. Step 2: Place a check by your desired payment method and complete fields if necessary. _____Check or CHECK-BY-FAX [details below] _____Money Order _____American Express Account Number__________________ Exp____/____ _____Visa Account Number__________________ Exp____/____ _____MasterCard Account Number__________________ Exp____/____ Please make your check or money order payable to "LSN". Step 3: Please complete and print the following fields clearly. Name ___________________________________________________ Address _________________________________________________ City ____________________________________________________ State ___________________________________________________ Zip _____________________________________________________ E-mail __________________________________________________ Signature _________________________________________________ [ required for check and credit card orders] Toll Free FAX Order Line: 1-800-940-6590 If faxing in your order, please state whether you require a fax, email, or no confirmation at all. Allow up to one day for confirmation, if requested. FAX orders are processed immediately. Or, print & mail to: LSN 273 S. State Rd. 7 #193 Margate, FL 33068-5727 ______________________________________________________ *CHECK BY FAX ORDERS: Complete the check as normal. Tape the check in the area below. Below the check, clearly write the check number, all numbers at the bottom of the check, & your name. Tape the check below and fax the check to the toll free FAX number above. Void the check. Our merchant will electronically debit your account for the amount of the check; your reference number for this transaction will be your check number. Nothing could be safer & easier ! TAPE CHECK BELOW _____________________________________________________________ This is a one time mailing: Removal is automatic and no further contact is necessary. Please Note: HGH is not intended to diagnose, treat, cure or prevent any disease. The FDA has not evaluated these statements. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message