Date: Tue, 5 May 2015 08:24:02 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Julian Elischer <julian@freebsd.org> Cc: "current@freebsd.org" <current@FreeBSD.ORG>, Jilles Tjoelker <jilles@stack.nl>, Konstantin Belousov <kostikbel@gmail.com> Subject: Re: seekdir/readdir patch .. Call for Review. Message-ID: <899435410.31616191.1430828642464.JavaMail.root@uoguelph.ca> In-Reply-To: <55470427.3010009@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote: > On 5/3/15 10:33 PM, Jilles Tjoelker wrote: > > On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov > > wrote: > >> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote: > >>> if you are interested in readdir(3), seekdir(3) and telldir(3) > >>> then > >>> you should look at > >>> https://reviews.freebsd.org/D2410 > >>> this patches around a problem in seekdir() that breaks Samba. > >>> Seekdir(3) will not work as expected when files prior to the > >>> point of > >>> interest in directory have been deleted since the directory was > >>> opened. > >>> Windows clients using Samba cause both these things to happen, > >>> causing > >>> the next readdir(3) after the bad seekdir(3) to skip some entries > >>> and > >>> return the wrong file. > >>> Samba only needs to step back a single directory entry in the > >>> case > >>> where it reads an entry and then discovers it can't fit it into > >>> the > >>> buffer it is sending to the windows client. It turns out we can > >>> reliably cater to Samba's requirement because the "last returned > >>> element" is always still in memory, so with a little care, we can > >>> set our filepointer back to it safely. (once) > >>> seekdir and readdir (and telldir()) need a complete rewrite along > >>> with > >>> getdirentries() but that is more than a small edit like this. > >> Can you explain your expectations from the whole readdir() vs. > >> parallel > >> directory modifications interaction ? From what I understood so > >> far, > >> there is unlocked modification of the container and parallel > >> iterator > >> over the same container. IMO, in such situation, whatever tweaks > >> you > >> apply to the iterator, it is still cannot be made reliable. > >> Before making single-purpose changes to the libc readdir and > >> seekdir > >> code, or to the kernel code, it would be useful to state exact > >> behaviour > >> of the dirent machinery we want to see. No, 'make samba works in > >> my > >> situation' does not sound good enough. > > Consider the subsequence of entries that existed at opendir() time > > and > > were not removed until now. This subsequence is clearly defined and > > does > > not have concurrency problems. The order of this subsequence must > > remain > > unchanged and seekdir() must be correct with respect to this > > subsequence. > > > > Additionally, two other kinds of entries may be returned. New > > entries > > may be inserted anywhere in between the entries of the subsequence, > > and > > removed entries may be returned as if they were still part of the > > subsequence (so that not every readdir() needs a system call). > > > > A simple implementation for UFS-style directories is to store the > > offset > > in the directory (all bits of it, not masking off the lower 9 > > bits). > > This needs d_off or similar in struct dirent. The kernel > > getdirentries() > > then needs a similar loop as the old libc seekdir() to go from the > > start > > of the 512-byte directory block to the desired entry (since an > > entry may > > not exist at the stored offset within the directory block). > > > > This means that a UFS-style directory cannot be compacted (existing > > entries moved from higher to lower offsets to fill holes) while it > > is > > open for reading. An NFS exported directory is always open for > > reading. > > > > This also means that duplicate entries can only be returned if that > > particular filename was deleted and created again. > > > > Without kernel support, it is hard to get telldir/seekdir > > completely > > reliable. The current libc implementation is wrong since the > > "holes" > > within the block just disappear and change the offsets of the > > following > > entries; the kernel cannot fix this using entries with d_fileno = 0 > > since it cannot know, in the general case, how long the deleted > > entry > > was in the filesystem-independent dirent format. My previous idea > > of > > storing one d_fileno during telldir() is wrong since it will fail > > if > > that entry is deleted. > > > > If you do not care about memory usage (which probably is already > > excessive with the current libc implementation), you could store at > > telldir() time the offset of the current block returned by > > getdirentries() and the d_fileno of all entries already returned in > > the > > current block. > > > > The D2410 patch can conceptually work for what Samba needs, > > stepping > > back one directory entry. I will comment on it. > thanks > > your comment is correct, but I don't think it really matters because > I'm only claiming > to fix a really small set of possible usages.. I might add a > sentence > in the seekdir > man page specifying what does and doesn't work. > Ok, I think I finally understand the bug that this patch is fixing... If telldir() is called when at the end of a block read by getdirentries(), it sets loc_seek to the seek position for the block and loc_loc to the offset of the end of the block. This is somewhat bogus, but works for the simple read-only case for seekdir() because it does a readdir()->getdirentries() of the next block. This patch fixes the telldir entry to refer to the beginning of the next block, so that seekdir() seeks to the correct block. I could argue that a more correct version of your fixup function would scan the telldir list for a match instead of just testing the first one. However, I now see that once fixed up, it won't match it again, so your code seems safe, just not completely correct. I'll take a closer look at it and put something in pahbricator later to-day, but I think it's ok. I would suggest describing the bug would help simple types like me figure it out. Thanks for your work on this, rick > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?899435410.31616191.1430828642464.JavaMail.root>