Date: Mon, 29 Apr 1996 11:51:56 -0700 From: Mitchell Erblich <merblich@ossi.com> To: freebsd-fs@FreeBSD.ORG, pvh@leftside.its.uct.ac.za Subject: Re: Compressing filesystem: Technical issues Message-ID: <199604291851.LAA07027@guacamole.ossi.com>
next in thread | raw e-mail | index | archive | help
Peter and et al, I would taker in consideration what is the typical type of file would be compressed and what is the benefit vs the tradeoffs. Disks are already too slow, isn't the overhead of just uncompressing the blocks, on demand in a random access pattern add a delay to the fs object. However, I will proceed with the assumption that this approach may have some merit. I am unfamiliar with the Netware implimentation, so I will ignore comparisons with it. Since, I am not the designer/architect of this idea I will may some obvious assumptions that the type of file shouldn't be a directory type, symbolic type, fragments, annonymous memory on a swapfs, etc. Note: fragments are pieces of separate files that can be merged together so they share a single block. 1) Depending on the usage of fragments within the fs and assuming the overhead of a compression/decompression algorithm and possible benefits, I will also eliminate text or binary files that are greater than an unknown value, as to not be able to use a fragment within this block. Since this space cannot be used anyway. 2) This code would have to be able to keep track of free fs blocks as a normal compression algorithm most likely will have to allocate blocks before they are freed. 3) Assuming that the original blocks might have been allocated somewhat contiguously, the algorithm may tradeoff fs object access speed vs fs object size. And it may cause a larger number of the objects within the fs to be subject to seeks between EACH block access. 4) That the compression algorithm NOT modify the fs object modification time. 5) That accesses to the fs compressed object may or may not cause the entire fs object to become uncompressed. 6) Any fs object within this fs should probably have a new magic number as to not allow NON aware fs compressed object from using this new object and a series of tests of what happens when such accesses do occur. Because of the above, unless the fs oject is allocated contigously (to eliminate seeks) (considering possible interleaving and such), and can be reallocated contigously for its compressed object and then uncompressed object, it should not be done. Even when this can happen, the question as to whether the unused block portion is necssaritly bad. Assume 1 time period after the compression, the fs object is appended to, this non-compressed unused block portion can then be used without a new block allocation in some cases. And last but not least, the COST of the hardware SCSI, etc drive is decreasing rapidly on a dollar per MB basis, and thus minimizing this possible use of a fs. AND, lastly I think a better approach is to decrease fs object access time on double and possibly triple indirect fs object implimentations. One way I am exporing this approach is the use of pre-contigous allocations for large fs objects and variable block sizing. I am currently attempting to impliment at my home fs blocks of 256k in size and larger. Mitchell Erblich : merblich@ossi.com Senior Software Engineer PS : I speak for myself and not my company. -------------------------------------------------------------------- > From owner-freebsd-fs@freefall.freebsd.org Thu Apr 25 17:01 PDT 1996 > Date: Fri, 26 Apr 1996 00:30:07 +0200 (SAT) > From: Peter van Heusden <pvh@leftside.its.uct.ac.za> > X-Sender: pvh@leftside > To: freebsd-fs@FreeBSD.ORG > Subject: Compressing filesystem: Technical issues > MIME-Version: 1.0 > X-Loop: FreeBSD.org > > I'm slowly getting started on the issue of writing a compressing > filesystem for BSD. The situation thus far: > > 1) I'm thinking of a model much like the Netware 4.x one, where a file is > compressed if it has not been 'touched' (ie. read or written) in a > certain time (e.g. a week). It is then decompressed on being 'touched'. > > 2) I think the correct approach is to base the filesystem on the existing > ufs code, and just add a flag which can sit in the i_flag field of the > inode which states whether this file is compressed or not. On a > successful read or write (i.e. one where data has actually been moved > to/from disk successfully) the to_be_compressed flag can be cleared. > > 3) I am as yet uncertain about some of the design of the mark and sweep > process which would do the compressing. My current thinking is that this > would be a daemon spawned at mount time which would cycle through the > inodes (in numerical order) doing the mark 'n sweep thing using a new > filesystem specific ioctl. An unmount would have to gracefully kill the > daemon process, of course. I'm currently not certain where to put the > temporary data during compression... in memory? In a filesystem? > > 4) I'll have to think up a good compression strategy which allows > recovery from corruption, etc etc. > > Anyway, in my mind, issue 3, the process to do the compressing, is the > one I am having the most problems with. Any suggestions on the design of > something like this would be appreciated. > > Thanks, > Peter >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604291851.LAA07027>
