Date: Mon, 04 May 2015 13:31:19 +0800 From: Julian Elischer <julian@freebsd.org> To: Jilles Tjoelker <jilles@stack.nl>, Konstantin Belousov <kostikbel@gmail.com> Cc: "current@freebsd.org" <current@FreeBSD.ORG> Subject: Re: seekdir/readdir patch .. Call for Review. Message-ID: <55470427.3010009@freebsd.org> In-Reply-To: <20150503143345.GB70576@stack.nl> References: <55432593.6050709@freebsd.org> <20150501161742.GW2390@kib.kiev.ua> <20150503143345.GB70576@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/3/15 10:33 PM, Jilles Tjoelker wrote: > On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov wrote: >> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote: >>> if you are interested in readdir(3), seekdir(3) and telldir(3) then >>> you should look at >>> https://reviews.freebsd.org/D2410 >>> this patches around a problem in seekdir() that breaks Samba. >>> Seekdir(3) will not work as expected when files prior to the point of >>> interest in directory have been deleted since the directory was opened. >>> Windows clients using Samba cause both these things to happen, causing >>> the next readdir(3) after the bad seekdir(3) to skip some entries and >>> return the wrong file. >>> Samba only needs to step back a single directory entry in the case >>> where it reads an entry and then discovers it can't fit it into the >>> buffer it is sending to the windows client. It turns out we can >>> reliably cater to Samba's requirement because the "last returned >>> element" is always still in memory, so with a little care, we can >>> set our filepointer back to it safely. (once) >>> seekdir and readdir (and telldir()) need a complete rewrite along with >>> getdirentries() but that is more than a small edit like this. >> Can you explain your expectations from the whole readdir() vs. parallel >> directory modifications interaction ? From what I understood so far, >> there is unlocked modification of the container and parallel iterator >> over the same container. IMO, in such situation, whatever tweaks you >> apply to the iterator, it is still cannot be made reliable. >> Before making single-purpose changes to the libc readdir and seekdir >> code, or to the kernel code, it would be useful to state exact behaviour >> of the dirent machinery we want to see. No, 'make samba works in my >> situation' does not sound good enough. > Consider the subsequence of entries that existed at opendir() time and > were not removed until now. This subsequence is clearly defined and does > not have concurrency problems. The order of this subsequence must remain > unchanged and seekdir() must be correct with respect to this > subsequence. > > Additionally, two other kinds of entries may be returned. New entries > may be inserted anywhere in between the entries of the subsequence, and > removed entries may be returned as if they were still part of the > subsequence (so that not every readdir() needs a system call). > > A simple implementation for UFS-style directories is to store the offset > in the directory (all bits of it, not masking off the lower 9 bits). > This needs d_off or similar in struct dirent. The kernel getdirentries() > then needs a similar loop as the old libc seekdir() to go from the start > of the 512-byte directory block to the desired entry (since an entry may > not exist at the stored offset within the directory block). > > This means that a UFS-style directory cannot be compacted (existing > entries moved from higher to lower offsets to fill holes) while it is > open for reading. An NFS exported directory is always open for reading. > > This also means that duplicate entries can only be returned if that > particular filename was deleted and created again. > > Without kernel support, it is hard to get telldir/seekdir completely > reliable. The current libc implementation is wrong since the "holes" > within the block just disappear and change the offsets of the following > entries; the kernel cannot fix this using entries with d_fileno = 0 > since it cannot know, in the general case, how long the deleted entry > was in the filesystem-independent dirent format. My previous idea of > storing one d_fileno during telldir() is wrong since it will fail if > that entry is deleted. > > If you do not care about memory usage (which probably is already > excessive with the current libc implementation), you could store at > telldir() time the offset of the current block returned by > getdirentries() and the d_fileno of all entries already returned in the > current block. > > The D2410 patch can conceptually work for what Samba needs, stepping > back one directory entry. I will comment on it. thanks your comment is correct, but I don't think it really matters because I'm only claiming to fix a really small set of possible usages.. I might add a sentence in the seekdir man page specifying what does and doesn't work.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55470427.3010009>