From owner-freebsd-current@FreeBSD.ORG Mon May 4 16:04:45 2015 Return-Path: Delivered-To: current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 555F6F76; Mon, 4 May 2015 16:04:45 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ED7BB1198; Mon, 4 May 2015 16:04:44 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t44G4ca8095725 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 4 May 2015 19:04:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t44G4ca8095725 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t44G4bV9095724; Mon, 4 May 2015 19:04:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 May 2015 19:04:37 +0300 From: Konstantin Belousov To: Julian Elischer Cc: Jilles Tjoelker , "current@freebsd.org" Subject: Re: seekdir/readdir patch .. Call for Review. Message-ID: <20150504160437.GI2390@kib.kiev.ua> References: <55432593.6050709@freebsd.org> <20150501161742.GW2390@kib.kiev.ua> <20150503143345.GB70576@stack.nl> <554787BA.1020105@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <554787BA.1020105@freebsd.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2015 16:04:45 -0000 On Mon, May 04, 2015 at 10:52:42PM +0800, Julian Elischer wrote: > On 5/3/15 10:33 PM, Jilles Tjoelker wrote: > > On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov wrote: > >> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote: > >>> if you are interested in readdir(3), seekdir(3) and telldir(3) then > >>> you should look at > >>> https://reviews.freebsd.org/D2410 > >>> this patches around a problem in seekdir() that breaks Samba. > >>> Seekdir(3) will not work as expected when files prior to the point of > >>> interest in directory have been deleted since the directory was opened. > >>> Windows clients using Samba cause both these things to happen, causing > >>> the next readdir(3) after the bad seekdir(3) to skip some entries and > >>> return the wrong file. > >>> Samba only needs to step back a single directory entry in the case > >>> where it reads an entry and then discovers it can't fit it into the > >>> buffer it is sending to the windows client. It turns out we can > >>> reliably cater to Samba's requirement because the "last returned > >>> element" is always still in memory, so with a little care, we can > >>> set our filepointer back to it safely. (once) > >>> seekdir and readdir (and telldir()) need a complete rewrite along with > >>> getdirentries() but that is more than a small edit like this. > >> Can you explain your expectations from the whole readdir() vs. parallel > >> directory modifications interaction ? From what I understood so far, > >> there is unlocked modification of the container and parallel iterator > >> over the same container. IMO, in such situation, whatever tweaks you > >> apply to the iterator, it is still cannot be made reliable. > >> Before making single-purpose changes to the libc readdir and seekdir > >> code, or to the kernel code, it would be useful to state exact behaviour > >> of the dirent machinery we want to see. No, 'make samba works in my > >> situation' does not sound good enough. > > Consider the subsequence of entries that existed at opendir() time and > > were not removed until now. This subsequence is clearly defined and does > > not have concurrency problems. The order of this subsequence must remain > > unchanged and seekdir() must be correct with respect to this > > subsequence. > > > > Additionally, two other kinds of entries may be returned. New entries > > may be inserted anywhere in between the entries of the subsequence, and > > removed entries may be returned as if they were still part of the > > subsequence (so that not every readdir() needs a system call). > > > > A simple implementation for UFS-style directories is to store the offset > > in the directory (all bits of it, not masking off the lower 9 bits). > > This needs d_off or similar in struct dirent. The kernel getdirentries() > > then needs a similar loop as the old libc seekdir() to go from the start > > of the 512-byte directory block to the desired entry (since an entry may > > not exist at the stored offset within the directory block). > > > > This means that a UFS-style directory cannot be compacted (existing > > entries moved from higher to lower offsets to fill holes) while it is > > open for reading. An NFS exported directory is always open for reading. > > > > This also means that duplicate entries can only be returned if that > > particular filename was deleted and created again. > > > > Without kernel support, it is hard to get telldir/seekdir completely > > reliable. The current libc implementation is wrong since the "holes" > > within the block just disappear and change the offsets of the following > > entries; the kernel cannot fix this using entries with d_fileno = 0 > > since it cannot know, in the general case, how long the deleted entry > > was in the filesystem-independent dirent format. My previous idea of > > storing one d_fileno during telldir() is wrong since it will fail if > > that entry is deleted. > > > > If you do not care about memory usage (which probably is already > > excessive with the current libc implementation), you could store at > > telldir() time the offset of the current block returned by > > getdirentries() and the d_fileno of all entries already returned in the > > current block. > > > > The D2410 patch can conceptually work for what Samba needs, stepping > > back one directory entry. I will comment on it. > > > how long do I have to wait until I can commit this and was kib's > comment a > "do not commit"? No, I only mean what I asked about. I do not have strong objections about the patch, but whatever is done in this regard, should clearly explain the case it handles and related limitations (IMO).