Date: Tue, 12 Aug 1997 01:51:18 -0700 (PDT) From: Simon Shapiro <Shimon@i-Connect.Net> To: FreeBSD-Hackers@FreeBSD.org Subject: DBFS - A new Filesystem for FreeBSD - Proposal Message-ID: <XFMail.970812015118.Shimon@i-Connect.Net>
next in thread | raw e-mail | index | archive | help
Hi Y'all, I am building a special filesystem, purpose built for RDBMS service. In case someone else wants to use it, I will briefly discuss it below. In any case, in the end there is a question I would like to get your opinion on. Known Features: * No subdirectories * No (or very simplistic) permissions * Very simple and linear block allocation * Able to span devices * Able to use resources (devices) on other, remote systems * Able to share the filesystem with another processor across the physical medium * Very fast directory and resource search * Able to perform either buffered or raw I/O on the same file. * Able to preallocate storage to a file, so files are contigious on a device * Able to pre-specify the number and size of extents a file can acquite beyond initial allocation If you have opinions so far, let me know. Remember, this file systems is NOT designed to compete with, replace, augment, nor complement existing filesystems. It is designed specifically as a storage manager for RDBMS engines. Questions: a. How do I provide the proper semantics for file creation in the Unix context? The Unix semantics for filesystems are that files are a stream of bytes, managed via a buffred I/O subsystem, etc. There is no place to specify size upon creation, extent policy, etc. b. How do i provide for buffered and non-buffered (raw) I/O on the same file? We have considered several options: a. Put the whole thing in a userspace library. This is what virtually every RDBMS vendor has done. It has the advantage of being easily portable and very easy to acomplish all the above as the semantics can be arbitrarily specified in the API without any consideration to exising Unix semantics. We do not like this option as it interferes with the distributed and shared aspects and is very costly in the area of locking and synchronization 9too many syscalls, too many semaphores, etc. b. Put the whole thing in a filesystem and forget any feature that vilates Unix file I/O semantics. We like this one the least as this is a ``Sodom Bed'' (Genesis, shorten the legs of the too tall and stretch he who is too short). c. Put the whole thing in the kernel and do all access from a library that does all the I/O via ioctl syscalls to the device. This is a hack that might work but will force lots of copyin and copyout, where raw I/O would have done much better. d. Extend exisitng system calls to accept the extra arguments we need. This is dangerous in our humble opinion as it will tamper with unbroken things and violate more standards than we have fingers to count them on. e. Add new syscalls to take care of what we need, exactly the way we need it. Without too many details, as an invitation to discuss, one can envision these: int dbopen(const char *path, size_t inital_size, size_t extents, size_t extent_size); We really do not need permissions and modes. But could have them if necessary. sizes are expressed in blocks, native to that fs instance. ssize_t dbread(int fd, void *buff, ssize_t offset, size_t blocks, int flags); Flags can be DBIO_RAW which forces a read from disk, rather than buffered read. Yes, we are aware of double jeopardy in reading unbuffered where buffered copies exist. We either have NO buffered I/O at all, or do our own buffering, in user space, or you come with a better solution :-) Also notice that we eliminate lseek in favor of specifying the seek as number of blocks from beginning of file. ssize_t dbwrite(int fd, const void *buff, ssize_t offset, size_t blocks, int flags); Again, flags can be DBFS_RAW which forces flushed/synchronous write. We are currently implemnting this code and an alpha version should be running in the next week or so. We would like it to be as acceptable to you gals and guys, so let me know what you think. Simon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970812015118.Shimon>