From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 2 17:26:06 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D013F106564A for ; Thu, 2 Apr 2009 17:26:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7FF8B8FC27 for ; Thu, 2 Apr 2009 17:26:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 065F246B03; Thu, 2 Apr 2009 13:26:06 -0400 (EDT) Date: Thu, 2 Apr 2009 18:26:05 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Gabriele Modena In-Reply-To: <1fe1d5d60904020904ya6dcb00h54a54d6a00e2bd0@mail.gmail.com> Message-ID: References: <1fe1d5d60903210422g70efef15hdd685695cdf8df3c@mail.gmail.com> <1fe1d5d60904020904ya6dcb00h54a54d6a00e2bd0@mail.gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1955128105-1238693166=:94891" Cc: freebsd-hackers@freebsd.org Subject: Re: GSoC: Semantic File System X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Apr 2009 17:26:10 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1955128105-1238693166=:94891 Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT On Thu, 2 Apr 2009, Gabriele Modena wrote: > On Sun, Mar 22, 2009 at 6:52 PM, Robert Watson wrote: >> We are certainly not uninterested in projects along these lines, but I >> think the trick will be creating a convincing proposal that argues that (a) >> you can do the work in a summer, (b) there's a compelling usage case for >> including the results in FreeBSD, and (c) find a mentor who can supervise >> you in this project. > > Thanks, I will keep it on mind when writing the proposal. How do you suggest > to proceed for finding a mentor? > > By the way, this is a project that I'm very probably going to carry on even > without GSoC support (even though that would be very useful). Well, I think the first step is to write the proposal, and we can see about shopping it around for a potential mentor. >> What sort of semantic file system do you have in mind?  How would you feel >> about a middle-ground project along the lines of Mac OS X Spotlight or >> similar efficient userspace indexing of a file system based on feedback >> from the file system about what has changed, or something BeOS-like, in >> which indexing takes place for extended attributes rather than for >> contents? > > In this moment I am considering also an userspace approach similar to > Spotlight/Beagles, but I don't know how I could propose this as a FreeBSD > GSoC project. I think that would make a fine GSoC proposal. Keep in mind that one of the premises of Spotlight is the fsevents kernel feature, and fseventsd, which allow Spotlight to subscribe to changes in trees and kick off reindexing as required. Porting the fsevents API to FreeBSD is fairly straight forward, with one exception: HFS+ offers a much more reliable notion of vnode->path mapping, but it would be interesting to see how well our current vnode->path mapping mechanisms would suffice in practice (since a lot of the edge cases that don't work well with our mapping system are exactly that -- edge cases). Between kernel and userspace parts there's quite a bit to do, but one possibility would be to borrow parts from Mac OS X/etc that we need. For example, do a literal port of the fsevents mechanism from XNU, provide our own implementation that provides a similar API, or provide a new mechanism that meets fseventd's semantic requirements for monitoring. > What I have in mind at the moment would be an indexing based on contents > rather than extended fs attributes. I did not know about the BeOS semantics > capabilities, I will surely have a look at that. I'm probably blending reality with imagination here, but my vague recollection is that the model was a slightly different blend of user vs. application involvement in indexing. For systems like Spotlight, there are no kernel-maintained indexes, the kernel simply provides a change list so that the userspace indexer can go through and apply file type-specific indexes to all files that have changed. So, for example, there are indexers for word files, plain text files, pdf's, and so on. In the BeOS model, or my reinterpretation based on something I read a long time ago and then presumably had dreams about, the split is a bit different: the file system maintains indexes of extended attributes, which are written by applications in order to expose searchable material. For example, a mail application might write out each message as a file, and attach a series of extended attributes, such as subject line, date, author, etc. These extended attributes are then indexed automatically by the file system in order to allow queries to be evaluated. I don't recall how queries and results are expressed, and in particular, whether the queries are processed by the file system (possibly exposed via special APIs or the name space) or userspace (accessing special files maintained by the kernel that are the indexes). It's also worth observing that one of the authors of BFS was Dominic Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as part of their Spotlight project. Robert N M Watson Computer Laboratory University of Cambridge --621616949-1955128105-1238693166=:94891--