Date: Tue, 29 Mar 2016 11:58:24 -0700 From: Hiroshi Nishida <nishida@asusa.net> To: freebsd-fs@freebsd.org Subject: Re: Problem with FUSE + fts Message-ID: <56FAD050.2080707@asusa.net> In-Reply-To: <56F6148D.2030706@asusa.net> References: <56F42EF4.5000505@asusa.net> <1294209833.31699182.1458950014610.JavaMail.zimbra@uoguelph.ca> <56F6148D.2030706@asusa.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Now I figured out what causes this error. Originally, fts outputs an ENOENT error when: FTSENT *p; struct stat sb; stat(p->fts_accpath, &sb); p->fts_ino != sb.st_ino i.e., the inode of p is different from sb.st_ino. They are usually same but sometimes a new inode is allocated to p while scanning the dir in the following way: 1. FUSE lowelevel's forget() is called for some reason and removes all entries from the entry tables, as well as clears all inodes, while scanning the dir. 2. Since p is already removed from FUSE's entry table, FUSE adds it again and allocates a new inode. I don't know why forget() is called for the directory which is still open and clears all inodes, but according to fuse_lowlevel.h /** * Forget about an inode * * This function is called when the kernel removes an inode * from its internal caches. * * The inode's lookup count increases by one for every call to * fuse_reply_entry and fuse_reply_create. The nlookup parameter * indicates by how much the lookup count should be decreased. * * Inodes with a non-zero lookup count may receive request from * the kernel even after calls to unlink, rmdir or (when * overwriting an existing file) rename. Filesystems must handle * such requests properly and it is recommended to defer removal * of the inode until the lookup count reaches zero. Calls to * unlink, remdir or rename will be followed closely by forget * unless the file or directory is open, in which case the * kernel issues forget only after the release or releasedir * calls. * removing the inode should be deferred until the dir is closed. I haven't checked the ref count of each node yet but there seems to be a bug in the above process. Also, there is a suggestion for the hash table but I will post later. Any feedback is appreciated on it. Thank you. On 2016/03/25 21:48, Hiroshi Nishida wrote: > Thank you for your response. > > On 3/25/16 4:53 PM, Rick Macklem wrote: >> I think I see the same thing when doing an "rm -r" on a fuse/GlusterFS volume. > > Unfortunately, it happens also with "find XXX -print", though I have experienced a similar "rm -r" + "XXX: No such file or directory" problem with UFS + SUJ. > And I also verified with truss that in > > _fstat(fd, &sb); > p->fts_ino != sb.st_ino > > stat() system call is called with the same path as p's. > > Anyway, the following patch for lib/libc/gen/fts.c prevents the error but is far from a good solution. > https://github.com/scopedog/FUSE-Test/blob/master/fts.c.patch > It assumes that the filesystem id (f_type in struct statfs) of FUSE is 0xed but I am not sure if it's applicable to all FUSE filesystems. > > I'll look into FUSE source code next week. >> To be honest, I just add a "-f" to the command to shut it up and then it deleted >> the tree. >> >> I think, in general, what readdir() returns after an entry is unlink'd is undefined >> behaviour. As such, the safe way to delete all of a directory is something like: >> - in a loop until readdir() returns EOF >> - opendir() >> - readdir() the first entry >> - unlink() that entry >> - closedir() >> --> So that all you ever do is readdir() the first entry after an opendir(). > > By the way, could you delete all the files with "-f"? > I am testing with a pretty big directory containing 81,000 files/dirs and have never used "-f", but have to "rm -r" again for undeleted entries. > However, the offset problem is very interesting as it seems to be applicable to all filesystems. > > Thank you. > -- Hiroshi Nishida nishida@asusa.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56FAD050.2080707>