From owner-p4-projects@FreeBSD.ORG Sat Sep 17 21:37:19 2005 Return-Path: X-Original-To: p4-projects@freebsd.org Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 8CC6816A422; Sat, 17 Sep 2005 21:37:18 +0000 (GMT) X-Original-To: perforce@freebsd.org Delivered-To: perforce@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 474EE16A41F for ; Sat, 17 Sep 2005 21:37:18 +0000 (GMT) (envelope-from soc-chenk@freebsd.org) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id B66F343D48 for ; Sat, 17 Sep 2005 21:37:17 +0000 (GMT) (envelope-from soc-chenk@freebsd.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.1/8.13.1) with ESMTP id j8HLbHKR064940 for ; Sat, 17 Sep 2005 21:37:17 GMT (envelope-from soc-chenk@freebsd.org) Received: (from perforce@localhost) by repoman.freebsd.org (8.13.1/8.13.1/Submit) id j8HLbHxZ064937 for perforce@freebsd.org; Sat, 17 Sep 2005 21:37:17 GMT (envelope-from soc-chenk@freebsd.org) Date: Sat, 17 Sep 2005 21:37:17 GMT Message-Id: <200509172137.j8HLbHxZ064937@repoman.freebsd.org> X-Authentication-Warning: repoman.freebsd.org: perforce set sender to soc-chenk@freebsd.org using -f From: soc-chenk To: Perforce Change Reviews Cc: Subject: PERFORCE change 83797 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Sep 2005 21:37:19 -0000 http://perforce.freebsd.org/chv.cgi?CH=83797 Change 83797 by soc-chenk@soc-chenk_leavemealone on 2005/09/17 21:37:03 Fixed mount/unmount related problems Submitted by: soc-chenk Affected files ... .. //depot/projects/soc2005/fuse4bsd2/Changelog#7 edit .. //depot/projects/soc2005/fuse4bsd2/IMPLEMENTATION_NOTES#5 edit .. //depot/projects/soc2005/fuse4bsd2/README.html#4 edit .. //depot/projects/soc2005/fuse4bsd2/fuse_module/fuse.c#6 edit .. //depot/projects/soc2005/fuse4bsd2/fuse_module/fuse.h#4 edit .. //depot/projects/soc2005/fuse4bsd2/fuselib/fuselib-2.4.0-pre2.diff#2 edit .. //depot/projects/soc2005/fuse4bsd2/mount_fusefs/Makefile#3 edit .. //depot/projects/soc2005/fuse4bsd2/mount_fusefs/mount_fusefs.8#1 add .. //depot/projects/soc2005/fuse4bsd2/mount_fusefs/mount_fusefs.c#5 edit Differences ... ==== //depot/projects/soc2005/fuse4bsd2/Changelog#7 (text+ko) ==== @@ -1,3 +1,17 @@ +Sat Sep 17 23:24:31 CEST 2005 at node: creo.hu, nick: csaba + * Fixed mount/unmount related problems + + Fixed reference leaks which prevented non-forced unmount + + Fixed handling of dirty buffers + + Refactored mount/unmount code + + Improved user interface + (no dummy mountpoint argument is needed for daemon anymore) + + Yet again cleaned up README.html + Thu Sep 15 00:01:08 CEST 2005 at node: creo.hu, nick: csaba * deleted reference to "configure tarball" ==== //depot/projects/soc2005/fuse4bsd2/IMPLEMENTATION_NOTES#5 (text+ko) ==== @@ -58,7 +58,7 @@ later). That is, these differences are related to the level of maturity of the two implementations, and as such, these are the less interesting ones. Let's see those differences of the modules which stem from the -differences of the two OS. +differences of the two OS and/or preferences of the implementors. TOC: 0) VFS API @@ -106,7 +106,7 @@ In FreeBSD, there are several fuse devices. When a Fuse daemon is started, it attaches itself to one -- either completely on its own or -using an already exsiting file descriptor, but it doesn't call any +using an already existing file descriptor, but it doesn't call any helper program and doesn't do anything mount related. Mounting is done by an external utility. This means that there must exist a global namespace in which the mounter can specify the daemon to mount. As one @@ -129,8 +129,8 @@ (struct) file via which a given file is opened/read/written. But this access is fragile, and in case of devices, it's readily broken by devfs in which devices live, as devfs bastardizes the (struct) files for its -own purposes (for those who don't know: FreeBSD implements a wisely -designed in-kernel device filesystem). +own purposes. (For those who don't know: FreeBSD implements a wisely +designed in-kernel device filesystem.) So devfs takes away acces from the file structures, but on the other hand, it makes the current implementation possible: it provides a @@ -166,13 +166,15 @@ * 1b -- security Fuse is a dedicatedly Promethean filesystem: it aims to the bring the -power of mounting to ordinary users. In Linux, ordinary users are -usually allowed to do mounts via the "user(s)" option of fstab. This is -a fairly static mechanism, so to provide the possibility of -non-privileged mounts, Linux Fuse rolls his own: the above mentioned -device opener/mounter utility is written in a way so that it can safely -bear the suid bit, and then the custom permission handling logic is -stuffed into this utility, too. +power of interaction via a custom filesystem interface to ordinary +users. Practically, this boils down to doing customized non-privileged +mounts. In Linux, ordinary users are usually allowed to do mounts via +the "user(s)" option of fstab. This is a fairly static mechanism, so to +be able to do the customized non-privileged mounts as it's required, +Linux Fuse rolls his own: the above mentioned device opener/mounter +utility is written in a way so that it can safely bear the suid bit, and +then the appropriate permission handling logic is stuffed into this +utility, too. In FreeBSD no heroic action is needed. No setuid mounting is needed -- unlike Linux, there are no "user(s)" option in fstab, and mount(8) @@ -229,9 +231,7 @@ ("devfs rule add path 'fuse*' mode 666"). Hey, conscious admins, hear my word, I hereby claim thou shalt not fear to do so. And, concerning paranoiac admins, these defaults save them from a heart -attack upon seeing world writable entries under /dev (though they might -better go and suffer a heart attack when kldstat reports that my module -is loaded...). +attack upon seeing world writable entries under /dev. * 1c -- dealing with the "allow other" misery @@ -250,7 +250,7 @@ There is a mount option, "allow_other", which removes this limitation. Of course, if anyone could use this option, that would pretty much defeat -the whole purpose of its existence. So by default, only root can use this. +the very purpose of its existence. So by default, only root can use this. However, the final decision is made by the setuid dispatcher; and his decision is based upon settings in the respective config file /etc/fuse.conf. @@ -267,7 +267,7 @@ option, useable only by root. But we don't make exceptions: "allow_other" can be used only by root, period. -Yet we have our own ways to not be so draconian. We have an explicit +Yet we have our own ways of being not too draconian. We have an explicit global unique userspace identifier of daemons in work. This allows the introduction of shared daemons. When the first (primary) @@ -334,12 +334,12 @@ There is the so-called "strategy" vnode method, which is used to transfer data between the "storage" (the daemon in our case) and the -vmio buffers; this is the central component of buffered I/O in BSD. It -takes only two parameters: the vnode we operate on, and the buffer -object we read into from or write to the storage (to read or to write: -this info is kept with the buffer). With Fuse, what are we to say to the -daemon, when the strategy is invoked? We need a "key", a suitable -filehandle identifier to perform the I/O request -- where to get one? +vmio buffers; this is the engine of buffered I/O in BSD. It takes only +two parameters: the vnode we operate on, and the buffer object we read +into from or write to the storage (to read or to write: this info is +kept with the buffer). With Fuse, what are we to say to the daemon, when +the strategy is invoked? We need a "key", a suitable filehandle +identifier to perform the I/O request -- where to get one? The situation is easy when reading or writing regular files: these operations can easily be arranged in a way that they will be file aware. @@ -361,7 +361,7 @@ Releasing it immediately (that is, strategy releases it before return) is pretty unefficient: the file should be re-opened at each -turn of a lengty read-in. +turn of a lengthy read-in. Just simply forgetting about it and polluting the daemon with worn-out filehandles is neither a good idea. Some kind of resource management should @@ -379,7 +379,7 @@ run? There is a neat built-in gc mechanism: it's invoked when vnodes become -unused. The usual effect of this method is disassociating the vnode +unused. The intended effect of this method is disassociating the vnode from its file (node), and putting it back to the pool of free vnodes. We don't do that, as then we would lose the number of lookups (which is needed for Fuse to operate correctly). Yet it's a pretty fine time @@ -434,7 +434,7 @@ come. When trying to implement fsync for Fuse, once again we bump into the -basic difference: Linux fsync (flush) file based, FreeBSD fsync is vnode +basic difference: Linux fsync (flush) is file based, FreeBSD fsync is vnode based. Here I can imagine that file basedness has a significance to the userspace: eg., it is possible that sshfs runs different sftp connection threads for different filehandles, and syncing the data stuffed into one @@ -449,19 +449,6 @@ would be too much pain (and we can't [yet] send/wait for many messages once, in a batch). -Note that in FreeBSD there is a different operation for syncing buffer -objects and vnodes; but usually this difference is invisible, as by -default the buffer syncing function just calls the syncer of the vnode -(for which there is a useable default provided, too). That is, traditional -file system authors deal only with the vnode fsync function, and in most -cases, they don't even do that, just accept the default. - -In Fuse, a no-op function is given as the buf syncer (as we insist on -all writes being synchronous, although there are still technical -problems in this regard), and the vnode's fsync operation works as -described above, in a way which has nothing to do with in-kernel -buffers. - 4) Messaging Here I give a brief comparison of the ways of implementing messaging @@ -525,7 +512,7 @@ * In Linux, the unique field of the request is filled with a really unique value upon being taken out of the pool (ie., number of take-out). In FreeBSD, unique values are owned by the ticket itself - (it's not changed during ticket's lifetime), so unique values give + (it's not changed during the ticket's lifetime), so unique values give information about the number of messaging sessions going in parallell (there is a secondary field for each ticket which stores the number of take-outs that ticket went through, but that's rarely used). @@ -541,10 +528,10 @@ required structs, and frontend methods for tickets set them to an appropriate value (to the appropriate point in the ticket's appropriate buffer). In some of the more complex cases this means a bit of manual pointer - arithmetic; for those of the complex patterns which are not unique - (mknod/creat/link), further, specific frontend methods are used. - In general, I didn't feel that this approach yields too much tedious - repetition when setting up a ticket. + arithmetic; for those of the complex patterns which occur repeatedly + (mknod/creat/link), further, specific frontend methods are used (to note, + in Linux, too). In general, I didn't feel that this approach yields too + much tedious repetition when setting up a ticket. Interrupt handling: in Linux, when a syscall is interrupted, the corresponding request is "backgrounded". It's put into another queue, and @@ -624,7 +611,7 @@ eg.: "shouldn't we bail out here because we are mounted read only?"; check whether a directory is tried to be moved into a subdirectory of itself when doing a rename; check whether vnodes are from the same - filesystem when doing hard linking, and so on. Sometimes it's trivial + filesystem when creating hard links, and so on. Sometimes it's trivial to do these (just shouldn't be forgotten about), sometimes not so much... ==== //depot/projects/soc2005/fuse4bsd2/README.html#4 (text+ko) ==== @@ -35,7 +35,7 @@ The module was written for and tested with CURRENT, aka FreeBSD-7.0. I'd guess it will work fine with RELENG 6 too, but currently it's not usable with 5.x (or lower) versions.

-Waht can be considered as a public homepage for the project is [WWW]http://wikitest.freebsd.org/moin.cgi/FuseFilesystem; for updates, further info go there. Get in contact with me via the soc-chenk email addrees of the FreeBSD organization (freebsd.org). +What can be considered as a public homepage for the project is [WWW]http://wikitest.freebsd.org/moin.cgi/FuseFilesystem; for updates, further info go there. Get in contact with me via the soc-chenk email addrees of the FreeBSD organization (freebsd.org).

Installation

@@ -98,32 +98,27 @@

  • Apply the patch with -

     patch -Np1 < fuselib<date>.diff
    +
     patch -Np1 < ../fuse4bsd/fuselib/fuselib-2.4.0-pre2.diff

  • - Assuming that you have my code at ../fuse4bsd, do + Do

     cp ../fuse4bsd/fuse_module/fuse_kernel.h include/ &&
    - cp ../fuse4bsd/fuse_module/linux_compat.h include/
    (the first command replaces fusermount.c with a trimmed down version without the mount support code, and the other two dynamically customize the header file defining the kernel-userland interface; as these are needed in the module as well, they are handled separately from the userspace patch). + cp ../fuse4bsd/fuse_module/linux_compat.h include/ (these commands dynamically customize the header file defining the kernel-userland interface; as these are needed in the module as well, they are handled separately from the userspace patch).

  • - We will do a non-privileged install (I'd say that's easier than set up a jail), I'll use ~/meta/fuse-2.4.0-pre2 as the prefix. Configure fuse with -

     ./configure --prefix ~/meta/fuse-2.4.0-pre2 --bindir=/tmp --disable-kernel-module MOUNT_FUSE_PATH=/tmp
    + We will do a non-privileged install (I'd say that's easier than set up a jail), I'll use ~/meta/fuse-2.4.0-pre2 as the prefix. Type the following commands: +
     mkdir junk &&
    + ./configure --prefix=$HOME/meta/fuse-2.4.0-pre2 --bindir=`pwd`/junk --disable-kernel-module MOUNT_FUSE_PATH=`pwd`/junk &&
    + make &&
    + ln -s /usr/bin/true junk/chown &&
    + ln -s /usr/bin/true junk/mknod &&
    + env PATH=`pwd`/junk:$PATH make install

  • -
  • -

    - Now type -

     make &&
    - ln -s /usr/bin/true chown &&
    - ln -s /usr/bin/true mknod &&
    - ln -s /usr/bin/true chmod &&
    - env PATH=`pwd`:$PATH make install
    -

    -
  • @@ -166,78 +161,30 @@ Go to sshfs' directory. First prepare the mount:
    mkdir -p ~/fuse &&
     export LD_LIBRARY_PATH=~/meta/fuse-2.4.0-pre2/lib/
    and also make sure that mount_fusefs (of FreeBSD Fuse) is in your path. Then do: -
    mount_fusefs auto ~/fuse ./sshfs foo@bar.baz: ""
    +
    mount_fusefs auto ~/fuse ./sshfs foo@bar.baz:

    If you want the daemon print the messages she gots from the kernel, you can append the -d flag to the end of the mount command (standard Fuse flag, as debug). However, this can be a little annoying, as the daemon will go to background, but will also muck up the terminal with its reports (unlike when you use -d under Linux). To keep it foreground, you can do the following: -

    env FUSE_DEV_NAME=/dev/fuse0 ./sshfs foo@bar.baz: "" -d
    Then open an other terminal, and type there: +
    env FUSE_DEV_NAME=/dev/fuse0 ./sshfs foo@bar.baz: -d
    Then open another terminal, and type there:
    mount_fusefs /dev/fuse0 ~/fuse
    (If /dev/fuse0 happens to be busy, use any other free Fuse device /dev/fuseN; most free of all is the one who doesn't exist.)

    -Finally, you will have to umount the filesystem by umount -f ~/fuse. You can't omit -f as of now. +Finally, you will have to umount the filesystem by umount ~/fuse.

    For more details, see the man page (mount_fusefs(8)).

    Bugs

    - -
      - -
    • - You have to use forced unmount for a Fuse filesystem. The reason for this is as follows: +See the respective section of mount_fusefs(8).

      -

      - With traditional filesystems, relations between fs entities are permanently stored in some background storage; but when one uses the filesystem, the fs hierarchy is built up and maintained by the kernel, and this what's transported to the userspace so that we get the usual "filesystem feeling". -

      -

      - Now with Fuse there is no permanent background storage. What takes the role of the background storage is the Fuse daemon's memory, that's where the primary instance of the fs hierarchy reigns. In this case, the above mentioned in-kernel hierarchy is a "mirror" of the hierarchy as seen by the daemon. This mirroring is required to be kept exact. This implies the following: if we want to maintain a given file's identity for the whole lifetime of the filesystem, we have to keep its in-kernel counterpart (a vnode), too. -

      -

      - This is in contrast with traditional filesystems: there, if the use count of a vnode falls to zero, then it's inactivated (file-specific data gets thrown away, vnode is put to the free list of the filesystem, ready for recirculation). If the file the vnode used to refer to is asked for again, no problem, a vnode is pulled from the free list and the file data attributes will be filled from the disk, yielding a vnode which is undistinguishable to the previous one. -

      -

      - In Fuse, if we inactivated the vnode upon use count falling to zero, then the file on the "storage" would vanish, too. In case of "less synthetic" filesystems as sshfs this wouldn't be catastrophic, but even then, creating/destroying/re-creating files would occupy a large part of the daemon's resources, and inodes would be wildly changing. In case of a "more synthetic" filesystem this might deeply disturb the filesystem's functionality. -

      -

      - So we don't let unused vnodes to be inactivated. And active vnodes prevent normal unmount. -

      -

      - The minor problem with this is the aesthetical one, The bigger one is that the above described "innocent" unmount blockage becomes undistinguishable from the real ones, when there is a serious reason for the kernel to not to let you unmount the fs. Eg., it might happen that there are open files on the filesystem. You forcibly umount it, but the reference to the open file is kept at the process using it. When that process tries to do something with the open file, that will most likely result in a panic. And the forced unmount itself can result in a panic if there are dirty buffers... which shouldn't exist. -

      -
    • -
    • -

      - Dirty buffers are considered to be non-existent yet they can strike in. -

      -

      - All writes are forced to be synchronous. That is, when data is written to a file of a Fuse filesystem, it is filtered through the buffer cache system, but unlike traditional filesystems, it's immediately gets written to the "storage". Again, the daemon's state has to be kept in sync with the kernel's, and the writing can't really be considered valid until the daemon accepts it -- there is no FUSE_IS_THIS_WRITE_LEGAL? rpc, only FUSE_WRITE. Moreover, writing data to "storage" is fast -- it's just pushing some buffers to userspace. What might be not so cheap is to push that data to its own background storage by the daemon, and thus daemons might maintain their own userspace buffer cache systems. That is, with Fuse dirty buffers are a userspace phenomenon, not an in-kernel one. -

      -

      - But if an I/O intensive write takes place, and it's interrupted, the above schema breaks -- there might be buffers written into the kernel which couldn't be passed to the daemon because of the interrupt. These will be marked as dirty, against all intention. And this makes forced unmount dangerous. -

      -
    • -
    • -

      - Solutions. The unmount implementation should be refined so it can throw away zero-usecount vnodes one by one, and that it can devalidate and throw away dirty buffers (with traditional filesystems, throwing away dirty buffers sounds to be a nonsense, as it means data loss, but with Fuse they are just a byproduct of an abortion; this reversed semantics might make it harder to do the refinement, I don't yet know). -

      -
    • - -
    - -

    TODO

    • - Fix umount -

      -
    • -
    • -

      - Backport to 5.x + Backport to 5.x, if it can be done without a major rewrite

    • ==== //depot/projects/soc2005/fuse4bsd2/fuse_module/fuse.c#6 (text+ko) ==== @@ -47,10 +47,6 @@ #define __static static #endif -#ifndef ROOTLESS_SHARES -#define ROOTLESS_SHARES 1 -#endif - MALLOC_DEFINE(M_FUSEMSG, "fuse messaging", "buffer for fuse messaging related things"); @@ -145,6 +141,7 @@ struct fuse_ticket *tick, uint64_t nid, enum fuse_opcode op, size_t blen, struct thread* td, struct ucred *cred); +__static __inline struct fuse_gate *fusedev_get_gate(struct cdev *fdev); __static __inline struct sx *fusedev_get_lock(struct cdev *fdev); __static __inline struct fuse_data *fusedev_get_data(struct cdev *fdev); @@ -479,9 +476,6 @@ data->freeticket_counter = 0; data->daemoncred = crhold(cred); - /* sx_init(&data->shareslock, "lock for fuse shares consistency"); */ - LIST_INIT(&data->fuse_shares_head); - return (data); } @@ -872,16 +866,22 @@ ihead->nodeid); } +__static __inline struct fuse_gate * +fusedev_get_gate(struct cdev *fdev) +{ + return (fdev->si_drv1); +} + __static __inline struct sx * fusedev_get_lock(struct cdev *fdev) { - return ((struct sx *)fdev->si_drv2); + return (&fusedev_get_gate(fdev)->slock); } __static __inline struct fuse_data * fusedev_get_data(struct cdev *fdev) { - return ((struct fuse_data *)fdev->si_drv1); + return (fusedev_get_gate(fdev)->fdata); } /******************** @@ -931,7 +931,7 @@ /**************************** * - * >>> Dummy fuse device op defs + * >>> Fuse device op defs * ****************************/ @@ -953,22 +953,24 @@ static int fusedev_open(struct cdev *dev, int oflags, int devtype, struct thread *td) { - struct fuse_data *data; struct sx *slock; + struct fuse_gate *fgate; if (dev->si_usecount > 1) return (EBUSY); FUSEREF; - data = fdata_alloc(td->td_ucred); - slock = fusedev_get_lock(dev); + fgate = fusedev_get_gate(dev); sx_xlock(slock); - dev->si_drv1 = data; + if (fgate->mp) { + sx_xunlock(slock); + return (EBUSY); + } + fgate->fdata = fdata_alloc(td->td_ucred); sx_xunlock(slock); - DEBUG("Opened device \"fuse\" (that of minor %d) successfully on thread %d.\n", minor(dev), td->td_tid); return(0); @@ -978,12 +980,14 @@ fusedev_close(struct cdev *dev, int fflag, int devtype, struct thread *p) { struct fuse_data *data; + struct fuse_gate *fgate; struct sx *slock; - data = dev->si_drv1; + data = fusedev_get_data(dev); slock = fusedev_get_lock(dev); sx_xlock(slock); - dev->si_drv1 = NULL; + fgate = fusedev_get_gate(dev); + fgate->fdata = NULL; sx_xunlock(slock); fdata_destroy(data); @@ -1006,7 +1010,7 @@ struct fuse_data *data; struct fuse_msg_node *fmsgn; - data = dev->si_drv1; + data = fusedev_get_data(dev); fuprintf("fuse device being read on thread %d\n", uio->uio_td->td_tid); @@ -1122,7 +1126,7 @@ if ((err = fuse_ohead_audit(ohead, uio))) goto drophead; - data = dev->si_drv1; + data = fusedev_get_data(dev); /* Pass stuff over to callback if there is one installed */ @@ -1501,7 +1505,6 @@ static vop_bmap_t fuse_bmap; static vop_print_t fuse_print; -static b_sync_t fuse_bufsync; static b_strategy_t fuse_bufstrategy; static struct vfsops fuse_vfsops = { @@ -1556,11 +1559,11 @@ .bop_name = "Fuse", .bop_strategy = fuse_bufstrategy, .bop_write = bufwrite, - .bop_sync = fuse_bufsync, + .bop_sync = bufsync, }; MALLOC_DEFINE(M_FUSEFS, "fuse filesystem", "buffer for fuse vfs layer"); -MALLOC_DEFINE(M_FUSEFH, "fuse filesystem", "buffer for fuse filehandles"); +MALLOC_DEFINE(M_FUSEFH, "fuse filehandles", "buffer for fuse filehandles"); static fuse_buffeater_t fuse_std_buffeater; static fuse_buffeater_t fuse_dir_buffeater; @@ -1601,22 +1604,18 @@ { int err = 0; int len, sharecount = 0; - int sharing = 0; char *fspec; struct vnode *devvp; struct vfsoptlist *opts; struct nameidata nd, *ndp = &nd; struct cdev *fdev; struct sx *slock; + struct fuse_gate *fgate; struct fuse_data *data; struct fuse_mnt_data *fmnt; struct vnode *rvp; struct fuse_vnode_data *fvdat; -#define SHAREDMOUNT 0x1 -#define PRIVMOUNT 0x2 -#define IS_SHARED(sh) (!((sh) & PRIVMOUNT)) - if (mp->mnt_flag & MNT_UPDATE) { uprintf("fuse: updating mounts is not supported\n"); return (EOPNOTSUPP); @@ -1647,11 +1646,6 @@ if (!fspec || fspec[len - 1] != '\0') return (EINVAL); -/* - vfs_flagopt(opts, "shared", sharing, SHAREDMOUNT); - */ - vfs_flagopt(opts, "private", &sharing, PRIVMOUNT); - FUSEREF; /* @@ -1672,6 +1666,7 @@ } fdev = devvp->v_rdev; + /* dev_ref(fdev); */ /* * according to coda code, no extra lock is needed -- * although in sys/vnode.h this field is marked "v" @@ -1687,12 +1682,17 @@ MALLOC(fmnt, struct fuse_mnt_data *, sizeof(*fmnt), M_FUSEFS, M_WAITOK| M_ZERO); + fmnt->fdev = fdev; + fmnt->mp = mp; + vfs_flagopt(opts, "private", &fmnt->mntopts, FUSEFS_PRIVATE); + vfs_flagopt(opts, "neglect_shares", &fmnt->mntopts, + FUSEFS_NEGLECT_SHARES); vfs_flagopt(opts, "allow_other", &fmnt->mntopts, FUSEFS_DAEMON_CAN_SPY); if (fmnt->mntopts & FUSEFS_DAEMON_CAN_SPY && suser(td)) { uprintf("only root can use \"allow_other\"\n"); - free(fmnt, M_FUSEFS); + FREE(fmnt, M_FUSEFS); err = EPERM; goto out; } @@ -1716,26 +1716,37 @@ uprintf("fuse daemon found, but has been backlisted\n"); } + fgate = fusedev_get_gate(fdev); if (!err) { - if (data->mp) { - if (! (data->dataflag & FDAT_SHARED && - IS_SHARED(sharing))) - /* - * device is owned and either us or owner - * insits on a private mount - */ - goto deny; + if (fgate->mp) { + fmnt->master = fgate->mp->mnt_data; + fmnt->mntopts |= FUSEFS_SECONDARY; + if (fmnt->master->mntopts & FUSEFS_BUSY) + /* + * Umount attempt is going on + */ + err = EBUSY; + if (fmnt->master->mntopts & FUSEFS_PRIVATE) + /* + * device is owned and owner doesn't + * wanna share it with us + */ + err = EPERM; + if (fmnt->mntopts & ~FUSEFS_SECONDARY) + /* + * Secondary mounts not allowed to have + * options (basicly, that would be + * useless though harmless, just let's + * be explicit about it) + */ + err = EINVAL; } else { if (suser(td) && td->td_ucred->cr_uid != data->daemoncred->cr_uid) /* we are not allowed to do the first mount */ - goto deny; + err = EPERM; } - goto allow; -deny: - err = EPERM; } -allow: if (err) { sx_xunlock(slock); @@ -1743,19 +1754,15 @@ goto out; } - if (data->mp) { - struct fuse_share *fsh; - MALLOC(fsh, struct fuse_share *, sizeof(*fsh), - M_FUSEFS, M_WAITOK); - fsh->uid = td->td_ucred->cr_uid; - fsh->master = data->mp; - LIST_INSERT_HEAD(&data->fuse_shares_head, fsh, - fuse_shares_link); - fmnt->share = fsh; - LIST_FOREACH(fsh, &data->fuse_shares_head, - fuse_shares_link) - sharecount++; + if (fmnt->mntopts & FUSEFS_SECONDARY) { + struct fuse_mnt_data *x_fmnt; + + LIST_INSERT_HEAD(&fmnt->master->slaves_head, fmnt, slaves_link); + LIST_FOREACH(x_fmnt, &fmnt->master->slaves_head, slaves_link) + sharecount++; } else { + LIST_INIT(&fmnt->slaves_head); + /* Now handshaking with daemon */ if ((err = fuse_send_init(data, td))) { @@ -1770,26 +1777,21 @@ FREE(fmnt, M_FUSEFS); goto out; } - if (IS_SHARED(sharing)) - data->dataflag |= FDAT_SHARED; } /* We need this here as this slot is used by getnewvnode() */ mp->mnt_stat.f_iosize = PAGE_SIZE; mp->mnt_data = fmnt; -#if ROOTLESS_SHARES - if (fmnt->share) + + /* code stolen from portalfs */ + + if (fmnt->mntopts & FUSEFS_SECONDARY) goto rootdone; -#endif - /* code stolen from portalfs */ + MALLOC(fvdat, struct fuse_vnode_data *, sizeof(*fvdat), M_FUSEFS, + M_WAITOK | M_ZERO); - if (data->mp) - fvdat = ((struct fuse_mnt_data *)data->mp->mnt_data)->rvp->v_data; - else - MALLOC(fvdat, struct fuse_vnode_data *, sizeof(*fvdat), - M_FUSEFS, M_WAITOK | M_ZERO); #if __FreeBSD_version >= 600000 err = getnewvnode("fuse", mp, &fuse_vnops, &rvp); #else @@ -1797,11 +1799,10 @@ #endif if (err) { - if (data->mp) { - fdata_kick_set(data); - FREE(fmnt, M_FUSEFS); - FREE(fvdat, M_FUSEFS); - } + fdata_kick_set(data); + FREE(fmnt, M_FUSEFS); + FREE(fvdat, M_FUSEFS); + sx_xunlock(slock); goto out; } @@ -1816,12 +1817,10 @@ fuse_vnode_init(rvp, fvdat, VDIR); rvp->v_vflag |= VV_ROOT; -#if ROOTLESS_SHARES rootdone: -#endif - if (! data->mp) { - data->mp = mp; + if (! (fmnt->mntopts & FUSEFS_SECONDARY)) { + fgate->mp = mp; #if ! REALTIME_TRACK_UNPRIVPROCDBG fmnt->mntopts &= ~FUSEFS_UNPRIVPROCDBG; fmnt->mntopts |= get_unprivileged_proc_debug(td) ? FUSEFS_UNPRIVPROCDBG : 0; @@ -1834,7 +1833,7 @@ mp->mnt_flag |= MNT_LOCAL; copystr(fspec, mp->mnt_stat.f_mntfromname, MNAMELEN - 1, &len); - if (fmnt->share && len >= 1) { + if (fmnt->mntopts & FUSEFS_SECONDARY && len >= 1) { /* * I've considered using s1, s2,... for shares, instead * #1, #2,... as s* is more conventional... @@ -1887,45 +1886,66 @@ flags |= FORCECLOSE; fmnt = mp->mnt_data; + slock = fusedev_get_lock(fmnt->fdev); + + if (! (fmnt->mntopts & FUSEFS_SECONDARY)) { +#if _DEBUG + struct vnode *vp, *nvp; +#endif - slock = fusedev_get_lock(fmnt->fdev); - sx_xlock(slock); - if (! (data = fusedev_get_data(fmnt->fdev))) - goto out; - if (fmnt->share) - LIST_REMOVE(fmnt->share, fuse_shares_link); - else { - if (! LIST_EMPTY(&data->fuse_shares_head)) { - sx_xunlock(slock); + sx_slock(slock); + if (! (mntflags & MNT_FORCE || + fmnt->mntopts & FUSEFS_NEGLECT_SHARES || + LIST_EMPTY(&fmnt->slaves_head))) { + sx_sunlock(slock); return (EBUSY); } - fdata_kick_set(data); + /* setting flag protecting lock upgrade */ + fmnt->mntopts |= FUSEFS_BUSY; + sx_sunlock(slock); +#if _DEBUG + MNT_ILOCK(mp); + MNT_VNODE_FOREACH(vp, mp, nvp) { + DEBUG2G("\n"); + vn_printf(vp, "..."); + } + MNT_IUNLOCK(mp); +#endif + + /* Flush files -> vflush */ + /* There is 1 extra root vnode reference (mp->mnt_data). */ + if ((err = vflush(mp, 1, flags, td))) { + DEBUG2G("err %d\n", err); + fmnt->mntopts &= ~FUSEFS_BUSY; + return (err); + } } -out: - sx_xunlock(slock); + sx_xlock(slock); + if (fmnt->mntopts & FUSEFS_SECONDARY) { + if (fmnt->master) + LIST_REMOVE(fmnt, slaves_link); + } else { + struct fuse_mnt_data *x_fmnt; + + fmnt->mntopts &= ~FUSEFS_BUSY; + LIST_FOREACH(x_fmnt, &fmnt->slaves_head, slaves_link) + x_fmnt->master = NULL; + if ((data = fusedev_get_data(fmnt->fdev))) + fdata_kick_set(data); - if ( -#if ROOTLESS_SHARES - ! fmnt->share -#else - 1 -#endif - ) { - /* Flush files -> vflush */ - /* There is 1 extra root vnode reference (mp->mnt_data). */ - if ((err = vflush(mp, 1, flags, td))) - return (err); + fusedev_get_gate(fmnt->fdev)->mp = NULL; } + sx_xunlock(slock); mp->mnt_data = NULL; - FREE(fmnt->share, M_FUSEFS); FREE(fmnt, M_FUSEFS); /* Other guys do this, I don't know what it is good for... */ mp->mnt_flag &= ~MNT_LOCAL; + /* dev_rel(fmnt->fdev); */ fuse_useco--; return (0); } @@ -1947,16 +1967,29 @@ DEBUG2G("mp %p: %s\n", mp, mp->mnt_stat.f_mntfromname); -#if ROOTLESS_SHARES - if (fmnt->share) - return fuse_root(fmnt->share->master, flags, vpp, td); -#endif + if (fmnt->mntopts & FUSEFS_SECONDARY) { + struct sx *slock; + int err; + + slock = fusedev_get_lock(fmnt->fdev); + sx_slock(slock); + if (fmnt->master) + err = fuse_root(fmnt->master->mp, flags, vpp, td); + else + err = ENXIO; + sx_sunlock(slock); + return (err); + } vp = fmnt->rvp; vref(vp); VOP_UNLOCK(vp, 0, td); vn_lock(vp, flags | LK_RETRY, td); *vpp = vp; +#if _DEBUG2G + DEBUG2G("root node:\n"); + vn_printf(vp, " * "); +#endif return (0); } @@ -1971,10 +2004,18 @@ DEBUG2G("mp %p: %s\n", mp, mp->mnt_stat.f_mntfromname); fmnt = mp->mnt_data; -#if ROOTLESS_SHARES - if (fmnt->share) - fmnt = (struct fuse_mnt_data *)fmnt->share->master->mnt_data; -#endif + if (fmnt->mntopts & FUSEFS_SECONDARY) { + struct sx *slock; + + slock = fusedev_get_lock(fmnt->fdev); + sx_slock(slock); + if (fmnt->master) + err = fuse_statfs(fmnt->master->mp, sbp, td); + else + err = ENXIO; + sx_sunlock(slock); + return (err); + } if ((err = fdisp_simple_putget(&fdi, FUSE_STATFS, fmnt->rvp, td, NULL))) return (err); @@ -2013,16 +2054,13 @@ DEBUG2G("mp %p: %s\n", mp, mp->mnt_stat.f_mntfromname); DEBUG("been asked for vno #%llu\n", nodeid); + fmnt = mp->mnt_data; if (nodeid == FUSE_ROOT_INODE) { - err = fuse_root(mp, myflags, vpp, td); - return (err); + vpp = &fmnt->rvp; + vn_lock(*vpp, myflags | LK_RETRY, td); + return (0); } - fmnt = mp->mnt_data; -#if ! ROOTLESS_SHARES - if (fmnt->share) - mp = fmnt->share->master; -#endif DEBUG2G("mp %p: %s\n", mp, mp->mnt_stat.f_mntfromname); /* XXX nodeid: cast from 64 bytes to 32 */ @@ -2048,7 +2086,8 @@ err = getnewvnode("fuse", mp, fuse_vnodeop_p, vpp); #endif #if _DEBUG - vn_printf(*vpp, DEBLABEL "fuse_vget_i: allocated new vnode\n"); + DEBUG2G("allocated new vnode:\n"); + vn_printf(*vpp, " * "); #endif if (err) { @@ -2077,7 +2116,8 @@ fuse_vnode_init(*vpp, fvdat, vtyp); #if _DEBUG - vn_printf(*vpp, DEBLABEL "fuse_vget_i: node #%d\n", VTOI(*vpp)); + DEBUG2G("\n"); + vn_printf(*vpp, " * "); #endif return (err); } @@ -2101,17 +2141,9 @@ vp->v_data = fvdat; SETPARENT(vp, (VTOI(vp) == FUSE_ROOT_INODE) ? vp : NULL); vp->v_type = vtyp; - if ( -#if ROOTLESS_SHARES - 1 -#else - ! ((struct fuse_mnt_data *)vp->v_mount->mnt_data)->share -#endif - ) { - sx_init(&fvdat->fh_lock, "lock for fuse filehandles"); - LIST_INIT(&fvdat->fh_head); - } + sx_init(&fvdat->fh_lock, "lock for fuse filehandles"); + LIST_INIT(&fvdat->fh_head); vp->v_bufobj.bo_ops = &fuse_bufops; vp->v_bufobj.bo_private = vp; @@ -2156,11 +2188,7 @@ * Taking down fuse_vnode_data structures is just hooked in here... * no separate destructor. */ - if ( -#if ! ROOTLESS_SHARES - ! ((struct fuse_mnt_data *)vp->v_mount->mnt_data)->share && -#endif - fvdat) { + if (fvdat) { sx_destroy(&fvdat->fh_lock); FREE(fvdat, M_FUSEFS); } @@ -2336,15 +2364,6 @@ struct fuse_dispatcher fdi; int err = 0; -#if ! ROOTLESS_SHARES - if (VTOI(vp) == FUSE_ROOT_INODE) { - if (fmnt->share) { - fmnt = fmnt->share->master->mnt_data; - vp = fmnt->rvp; - } - } -#endif - if ((err = fdisp_simple_putget(&fdi, FUSE_GETATTR, vp, td, cred))) return (err); @@ -2373,14 +2392,14 @@ #if REALTIME_TRACK_UNPRIVPROCDBG get_unprivileged_proc_debug(td), #else - ((struct fuse_mnt_data *)vp->v_mount->mnt_data)->mntopts & FUSEFS_UNPRIVPROCDBG, + fmnt->mntopts & FUSEFS_UNPRIVPROCDBG, #endif fusedev_get_data(fdi.fdev)->daemoncred, cred))) { - struct fuse_share *fsh; + struct fuse_mnt_data *x_fmnt; - LIST_FOREACH(fsh, &fdi.data->fuse_shares_head, fuse_shares_link) { - if (! (denied = (fsh->uid != cred->cr_uid))) + LIST_FOREACH(x_fmnt, &fmnt->slaves_head, slaves_link) { + if (! (denied = (x_fmnt->mp->mnt_cred->cr_uid != cred->cr_uid))) break; } @@ -2403,7 +2422,8 @@ DEBUG("node #%d, type %d\n", VTOI(vp), vap->va_type); #if _DEBUG - vn_printf(vp, DEBLABEL "fuse_getattr: node #%d\n", VTOI(vp)); + DEBUG2G("\n"); + vn_printf(vp, " * "); #endif return (0); } @@ -2498,6 +2518,11 @@ /* general stuff, based on vfs_cache_lookup */ >>> TRUNCATED FOR MAIL (1000 lines) <<<