Date: Thu, 30 Apr 2015 14:56:25 +0800 From: Julian Elischer <julian@freebsd.org> To: Jilles Tjoelker <jilles@stack.nl>, John Baldwin <jhb@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: readdir/telldir/seekdir problem (i think) Message-ID: <5541D219.6060900@freebsd.org> In-Reply-To: <20150424215249.GA96554@stack.nl> References: <55386505.70708@freebsd.org> <553A7DB0.8080308@freebsd.org> <553A8D28.7090901@freebsd.org> <4718551.Y2ZnMk6NSM@ralph.baldwin.cx> <20150424215249.GA96554@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On 4/25/15 5:52 AM, Jilles Tjoelker wrote: > On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote: >> Yes, this isn't at all safe. There's no guarantee whatsoever that >> the offset on the directory fd that isn't something returned by >> getdirentries has any meaning. In particular, the size of the >> directory entry in a random filesystem might be a different size >> than the structure returned by getdirentries (since it converts >> things into a FS-independent format). >> This might work for UFS by accident, but this is probably why ZFS >> doesn't work. Turns out they are all broken in the face of deletes. I managed to make a patch that fixes things in the simple case of "backing up by one (and only one time) record", which is what samba needs to do. Samba reads until the latest entry won't fit (buffer is full) and then effectively pushes the last one back by doing a seekdir back one record. It then sends the full buffer back to the client and continus reading at the next record, which is the one it pushed back. It turs out that because the buffer holding the last dirent presented is always still resident in memory, it is ALWAYS possible to slip the 'current entry' pointer back ONE entry, (but no more). This unbreaks samba, even in the case of deletes. (at least with UFS and ZFS). I'll make the patch available when I've tested it more and cleaned it up. >> However, this might be properly fixed by the thing that ino64 is >> doing where each directory entry returned by getdirentries gives >> you a seek offset that you _can_ directly seek to (as opposed to >> seeking to the start of the block and then walking forward N >> entries until you get an inter-block entry that is the same). > The ino64 branch only reserves space for d_off and does not use it in > any way. This is appropriate since actually using d_off is a major > feature addition. d_cookie please.. :-) > > A proper d_off would still be useful even if UFS's readdir keeps masking > off the offset so a directory read always starts at the beginning of a > 512-byte directory block, since this allows more distinct offset values > than safely using getdirentries()'s *basep. With d_off, one outer loop > must read at least one directory block to avoid spinning indefinitely, > while using getdirentries()'s *basep requires reading the whole > getdirentries() buffer. not sure I follow what you mean by that. > > Some Linux filesystems go further and provide a unique d_off for each > entry. yes ZFS provides that. we currently don't export it beyond teh VFS, but there is code that is supposed to supply a cookie for each entry. It could be put into the dirent structure should we want to do that. Currently it is put into a separate memory array in parallel to the dirents, except that we give it a NULL pointer, which means "we don't want cookies thanks" . We really need to do something because the current system is really broken. And the fact that dirent has *32 bit* inode number in it was a shock.. I'd presumed that had gone the way of the dinosaurs and dodo. I think 11 needs to have a new dirent structure given out by a new syscall. (old one still present for compat reasons). Whether we need a readdir64() and friends I have not yet decided. Maybe it's time to bump libc's number again :-) > > Another idea would be to store the last d_ino instead of dd_loc into the > struct ddloc. On seekdir(), this would seek to loc_seek as before and > skip entries until that d_ino is found, or to the start of the buffer if > not found (and possibly return some entries again that should not be > returned, but Samba copes with that). I didn't see it coping with that.. an earlier version of my patch suffered from duplicate entries in samba. it is also dangerous because multiple files linked to the same inode would get confused. it may be possible however to link the telldir cookies to a HASH of the filename and inode number. (assuming of course we don't have cookies directly from the fiesystem itself.) the current scheme of seeking to a buffer boundary and seeking forward in the buffer looking for a matching entry would become quite robust using this scheme, assuming that the lseek actually worked on that file system. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5541D219.6060900>