From owner-freebsd-current@FreeBSD.ORG Thu Apr 23 14:02:36 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22232F41; Thu, 23 Apr 2015 14:02:36 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F227D1125; Thu, 23 Apr 2015 14:02:35 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id A073DB94B; Thu, 23 Apr 2015 10:02:34 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: readdir/telldir/seekdir problem (i think) Date: Thu, 23 Apr 2015 09:54:32 -0400 Message-ID: <10872728.5fNYcpCvKN@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: <5538B510.9040603@freebsd.org> References: <55386505.70708@freebsd.org> <5538B510.9040603@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 23 Apr 2015 10:02:34 -0400 (EDT) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 14:02:36 -0000 On Thursday, April 23, 2015 05:02:08 PM Julian Elischer wrote: > On 4/23/15 11:20 AM, Julian Elischer wrote: > > I'm debugging a problem being seen with samba 3.6. > > > > basically telldir/seekdir/readdir don't seem to work as advertised.. > > ok so it looks like readdir() (and friends) is totally broken in the face > of deletes unless you read the entire directory at once or reset to the > the first file before the deletes, or earlier. I'm not sure that Samba isn't assuming non-portable behavior. For example: >From http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir_r.html If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. While this doesn't speak directly to your case, it does note that you will get inconsistencies if you scan a directory concurrent with add and remove. UFS might kind of work actually since deletes do not compact the backing directory, but I suspect NFS and ZFS would not work. In addition, our current NFS support for seekdir is pretty flaky and can't be fixed without changes to return the seek offset for each directory entry (I believe that the projects/ino64 patches include this since they are breaking the ABI of the relevant structures already). The ABI breakage makes this a very non-trivial task. However, even if you have that per-item cookie, it is likely meaningless in the face of filesystems that use any sort of more advanced structure than an array (such as trees, etc.) to store directory entries. POSIX specifically mentions this in the rationale for seekdir: http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html One of the perceived problems of implementation is that returning to a given point in a directory is quite difficult to describe formally, in spite of its intuitive appeal, when systems that use B-trees, hashing functions, or other similar mechanisms to order their directories are considered. The definition of seekdir() and telldir() does not specify whether, when using these interfaces, a given directory entry will be seen at all, or more than once. In fact, given that quote, I would argue that what Samba is doing is non-portable. This would seem to indicate that a conforming seekdir could just change readdir to immediately return EOF until you call rewinddir. -- John Baldwin