From owner-freebsd-current@FreeBSD.ORG Tue May 5 12:24:04 2015 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1C87C16; Tue, 5 May 2015 12:24:04 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4EAD71ACD; Tue, 5 May 2015 12:24:03 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2D/AwBxtUhV/95baINcg19cBYMYwl8JgUwKhTdOAoFuFAEBAQEBAQGBCoQhAQEEAQEBGgYEJyALGxgCAg0ZAikBCRgBDQYIBwQBHASICw2wRJNWAQEBAQEBBAEBAQEBAQEBARmBIYoYhDMBARw0B4JogUUFljCCRoFMg1U9hg6KbYNTI4FlgisiMQeBBDmBAQEBAQ X-IronPort-AV: E=Sophos;i="5.13,372,1427774400"; d="scan'208";a="208708009" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 05 May 2015 08:24:02 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 74735B3F33; Tue, 5 May 2015 08:24:02 -0400 (EDT) Date: Tue, 5 May 2015 08:24:02 -0400 (EDT) From: Rick Macklem To: Julian Elischer Cc: "current@freebsd.org" , Jilles Tjoelker , Konstantin Belousov Message-ID: <899435410.31616191.1430828642464.JavaMail.root@uoguelph.ca> In-Reply-To: <55470427.3010009@freebsd.org> Subject: Re: seekdir/readdir patch .. Call for Review. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 May 2015 12:24:04 -0000 Julian Elischer wrote: > On 5/3/15 10:33 PM, Jilles Tjoelker wrote: > > On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov > > wrote: > >> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote: > >>> if you are interested in readdir(3), seekdir(3) and telldir(3) > >>> then > >>> you should look at > >>> https://reviews.freebsd.org/D2410 > >>> this patches around a problem in seekdir() that breaks Samba. > >>> Seekdir(3) will not work as expected when files prior to the > >>> point of > >>> interest in directory have been deleted since the directory was > >>> opened. > >>> Windows clients using Samba cause both these things to happen, > >>> causing > >>> the next readdir(3) after the bad seekdir(3) to skip some entries > >>> and > >>> return the wrong file. > >>> Samba only needs to step back a single directory entry in the > >>> case > >>> where it reads an entry and then discovers it can't fit it into > >>> the > >>> buffer it is sending to the windows client. It turns out we can > >>> reliably cater to Samba's requirement because the "last returned > >>> element" is always still in memory, so with a little care, we can > >>> set our filepointer back to it safely. (once) > >>> seekdir and readdir (and telldir()) need a complete rewrite along > >>> with > >>> getdirentries() but that is more than a small edit like this. > >> Can you explain your expectations from the whole readdir() vs. > >> parallel > >> directory modifications interaction ? From what I understood so > >> far, > >> there is unlocked modification of the container and parallel > >> iterator > >> over the same container. IMO, in such situation, whatever tweaks > >> you > >> apply to the iterator, it is still cannot be made reliable. > >> Before making single-purpose changes to the libc readdir and > >> seekdir > >> code, or to the kernel code, it would be useful to state exact > >> behaviour > >> of the dirent machinery we want to see. No, 'make samba works in > >> my > >> situation' does not sound good enough. > > Consider the subsequence of entries that existed at opendir() time > > and > > were not removed until now. This subsequence is clearly defined and > > does > > not have concurrency problems. The order of this subsequence must > > remain > > unchanged and seekdir() must be correct with respect to this > > subsequence. > > > > Additionally, two other kinds of entries may be returned. New > > entries > > may be inserted anywhere in between the entries of the subsequence, > > and > > removed entries may be returned as if they were still part of the > > subsequence (so that not every readdir() needs a system call). > > > > A simple implementation for UFS-style directories is to store the > > offset > > in the directory (all bits of it, not masking off the lower 9 > > bits). > > This needs d_off or similar in struct dirent. The kernel > > getdirentries() > > then needs a similar loop as the old libc seekdir() to go from the > > start > > of the 512-byte directory block to the desired entry (since an > > entry may > > not exist at the stored offset within the directory block). > > > > This means that a UFS-style directory cannot be compacted (existing > > entries moved from higher to lower offsets to fill holes) while it > > is > > open for reading. An NFS exported directory is always open for > > reading. > > > > This also means that duplicate entries can only be returned if that > > particular filename was deleted and created again. > > > > Without kernel support, it is hard to get telldir/seekdir > > completely > > reliable. The current libc implementation is wrong since the > > "holes" > > within the block just disappear and change the offsets of the > > following > > entries; the kernel cannot fix this using entries with d_fileno = 0 > > since it cannot know, in the general case, how long the deleted > > entry > > was in the filesystem-independent dirent format. My previous idea > > of > > storing one d_fileno during telldir() is wrong since it will fail > > if > > that entry is deleted. > > > > If you do not care about memory usage (which probably is already > > excessive with the current libc implementation), you could store at > > telldir() time the offset of the current block returned by > > getdirentries() and the d_fileno of all entries already returned in > > the > > current block. > > > > The D2410 patch can conceptually work for what Samba needs, > > stepping > > back one directory entry. I will comment on it. > thanks > > your comment is correct, but I don't think it really matters because > I'm only claiming > to fix a really small set of possible usages.. I might add a > sentence > in the seekdir > man page specifying what does and doesn't work. > Ok, I think I finally understand the bug that this patch is fixing... If telldir() is called when at the end of a block read by getdirentries(), it sets loc_seek to the seek position for the block and loc_loc to the offset of the end of the block. This is somewhat bogus, but works for the simple read-only case for seekdir() because it does a readdir()->getdirentries() of the next block. This patch fixes the telldir entry to refer to the beginning of the next block, so that seekdir() seeks to the correct block. I could argue that a more correct version of your fixup function would scan the telldir list for a match instead of just testing the first one. However, I now see that once fixed up, it won't match it again, so your code seems safe, just not completely correct. I'll take a closer look at it and put something in pahbricator later to-day, but I think it's ok. I would suggest describing the bug would help simple types like me figure it out. Thanks for your work on this, rick > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" >