Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 2015 13:10:34 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>, John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: readdir/telldir/seekdir problem (i think)
Message-ID:  <5539D04A.3090309@freebsd.org>
In-Reply-To: <336285737.24821463.1429825825843.JavaMail.root@uoguelph.ca>
References:  <336285737.24821463.1429825825843.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/24/15 5:50 AM, Rick Macklem wrote:
> John Baldwin wrote:
>> On Thursday, April 23, 2015 05:02:08 PM Julian Elischer wrote:
>>> On 4/23/15 11:20 AM, Julian Elischer wrote:
>>>> I'm debugging a problem being seen with samba 3.6.
>>>>
>>>> basically  telldir/seekdir/readdir don't seem to work as
>>>> advertised..
>>> ok so it looks like readdir() (and friends) is totally broken in
>>> the face
>>> of deletes unless you read the entire directory at once or reset to
>>> the
>>> the first file before the deletes, or earlier.
>> I'm not sure that Samba isn't assuming non-portable behavior.  For
>> example:
>>
>> From
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir_r.html
>>
>> If a file is removed from or added to the directory after the most
>> recent call
>> to opendir() or rewinddir(), whether a subsequent call to readdir()
>> returns an
>> entry for that file is unspecified.
>>
>> While this doesn't speak directly to your case, it does note that you
>> will
>> get inconsistencies if you scan a directory concurrent with add and
>> remove.
>>
>> UFS might kind of work actually since deletes do not compact the
>> backing
>> directory, but I suspect NFS and ZFS would not work.  In addition,
>> our
>> current NFS support for seekdir is pretty flaky and can't be fixed
>> without
>> changes to return the seek offset for each directory entry (I believe
>> that
>> the projects/ino64 patches include this since they are breaking the
>> ABI of
>> the relevant structures already).  The ABI breakage makes this a very
>> non-trivial task.  However, even if you have that per-item cookie, it
>> is
>> likely meaningless in the face of filesystems that use any sort of
>> more
>> advanced structure than an array (such as trees, etc.) to store
>> directory
>> entries.  POSIX specifically mentions this in the rationale for
>> seekdir:
>>
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/seekdir.html
>>
>> One of the perceived problems of implementation is that returning to
>> a given point in a directory is quite difficult to describe
>> formally, in spite of its intuitive appeal, when systems that use
>> B-trees, hashing functions, or other similar mechanisms to order
>> their directories are considered. The definition of seekdir() and
>> telldir() does not specify whether, when using these interfaces, a
>> given directory entry will be seen at all, or more than once.
>>
>> In fact, given that quote, I would argue that what Samba is doing is
>> non-portable.  This would seem to indicate that a conforming seekdir
>> could
>> just change readdir to immediately return EOF until you call
>> rewinddir.
>>
> Loosely related to this, I have been tempted to modify readdir() to
> read the entire directory on the first readdir() for NFS, to avoid the
> readdir()/unlink() problem.

I did find a bug in our readdir/seekdir that makes it a lot worse...
We reload the kernel's idea of the directory every time we do
a seekdir() back, even if it's within the same block,
which makes us a lot more susceptible to the problem..
making it not do that unless the new location is in another block made
it work on directories with up to several thousand files (with 32k 
blocksize)
before failing.
With that bug it's possible do make every seekdir() produce failures
even in a directory of just 3 files..  The downside is that the client
continues to see the old contents of the block even though he has done 
a seekdir()
within it.

>
> My concern was doing this for a very large directory. Maybe it could be
> done for directories up to some size?
>
> rick
>
>> --
>> John Baldwin
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to
>> "freebsd-current-unsubscribe@freebsd.org"
>>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5539D04A.3090309>