Date: Mon, 27 Apr 2015 16:03:08 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Julian Elischer <julian@freebsd.org> Cc: freebsd-current@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: readdir/telldir/seekdir problem (i think) Message-ID: <1101073752.26759547.1430164988301.JavaMail.root@uoguelph.ca> In-Reply-To: <553E676D.1020902@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote: > On 4/25/15 4:28 AM, John Baldwin wrote: > > On Saturday, April 25, 2015 02:36:24 AM Julian Elischer wrote: > >> On 4/25/15 1:30 AM, Julian Elischer wrote: > >>> On 4/24/15 10:59 PM, John Baldwin wrote: > >>>> Index: head/lib/libc/gen/telldir.c > >>>> =================================================================== > >>>> --- head/lib/libc/gen/telldir.c (revision 281929) > >>>> +++ head/lib/libc/gen/telldir.c (working copy) > >>>> @@ -101,8 +101,10 @@ > >>>> return; > >>>> if (lp->loc_loc == dirp->dd_loc && lp->loc_seek == > >>>> dirp->dd_seek) > >>>> return; > >>>> - (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, > >>>> SEEK_SET); > >>>> - dirp->dd_seek = lp->loc_seek; > >>>> + if (lp->loc_seek != dirp->dd_seek) { > >>>> + (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, > >>>> SEEK_SET); > >>>> + dirp->dd_seek = lp->loc_seek; > >>>> + } > >>> yes I did that yesterday but it still fails when you transition > >>> blocks.. (badly). > >>> > >>> I also tried bigger blocks.. also fails (eventually) > >>> > >>> I did find a way to make it work... you had to seek back > >>> to the first block you deleted on each set.. > >>> then work forward from there again.. unfortunately since > >>> I'm trying to make a microsoft program not fail (via samba) > >>> I have no control over how it does things and seekdir doesn't > >>> know what was deleted anyway... (so the fix is fine for the > >>> test program but not for real life) > >>> > >>> I think I can make the BSD one act like the linux one by changing > >>> the lseek being done to use the offset (loc) plus the buffer seek > >>> address of the target, instead of just going for the buffer base > >>> and > >>> stepping forward through the entries.. > >>> > >>> maybe tomorrow. > >>> > >> The following conditional code makes ours behave the same as the > >> linux > >> one. > >> it breaks several 'rules' but works where ours is clean but > >> fails.. > >> as Rick said.. "maybe that's what we should do too." > >> > >> > >> this is at the end of seekdir() > >> > >> > >> The new code does what linux does.. and shouldn't work.. but does > >> // at least in the limited conditions I need it to. > >> // We'll probably need to do this at work...: > >> > >> > >> The original code is what we have now, but gets mightily confused > >> sometimes. > >> // This is clean(er) but fails in specific > >> situations(when > >> doing commands > >> // from Microft windows, via samba). > >> > >> > >> root@vps1:/tmp # diff -u dir.c.orig dir.c > >> --- dir.c.orig 2015-04-24 11:29:36.855317000 -0700 > >> +++ dir.c 2015-04-24 11:15:49.058500000 -0700 > >> @@ -1105,6 +1105,13 @@ > >> dirp->dd_loc = lp->loc_loc; > >> return; > >> } > >> +#ifdef GLIBC_SEEK > >> + (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc, > >> SEEK_SET); > >> + dirp->dd_seek = lp->loc_seek + lp->loc_loc; > >> + dirp->dd_loc = 0; > >> + lp->loc_seek = dirp->dd_seek; > >> + lp->loc_loc = 0; > >> +#else > >> (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET); > >> dirp->dd_seek = lp->loc_seek; > >> dirp->dd_loc = 0; > >> @@ -1114,6 +1121,7 @@ > >> if (dp == NULL) > >> break; > >> } > >> +#endif > >> } > > Yes, this isn't at all safe. There's no guarantee whatsoever that > > the offset on the directory fd that isn't something returned by > > getdirentries has any meaning. In particular, the size of the > > directory entry in a random filesystem might be a different size > > than the structure returned by getdirentries (since it converts > > things into a FS-independent format). > > > > This might work for UFS by accident, but this is probably why ZFS > > doesn't work. > > > > However, this might be properly fixed by the thing that ino64 is > > doing where each directory entry returned by getdirentries gives > > you a seek offset that you _can_ directly seek to (as opposed to > > seeking to the start of the block and then walking forward N > > entries until you get an inter-block entry that is the same). > I just made the stunning discovery that our seekdir/readdir/telldir > code in libc works with > FreeBSD 8.0. > so maybe the problem is that the kernel changed it's behaviour, and > no-one thought to fix libc.. > > (at least it works on one of our 8.0 base appliances.. I'll do more > testing tomorrow.. it's past midnight.) > I suspect that pre-r252438 systems work better for UFS than r252438 or later. That patch changed ufs_readdir() so that it no longer returned the on-disk directory structure. (Among other things, it added code that skipped over d_ino == 0 entries.) As such, r252438 and later systems have UFS where the "logical" offset of a directory entry returned by getdirentries() isn't the same as the "physical" offset for it in the on-disk directory. Having said the above, I have two somewhat inconsistent thoughts: 1 - As jhb has explained, the libc functions aren't safe for telldir()/seekdir() when entries are added/deleted. It just happens that UFS might work ok (and is more likely to work ok when "logical offset" == "physical offset"). 2 - I'm not sure r252438 was a good idea (at least the part that skips invalid d_ino == 0 entries) because I don't think making "logical offset" != "physical offset" is a good idea, if there isn't a good reason to need to do so. I think it is hard to argue that r252438 broke the libc functions. It just happens that cases that aren't guaranteed to work happens to work without r252438. I also think that the use of d_off (or d_cookie, if you prefer that name), which would be the "physical offset" of the next directory entry is the best bet for fixing this, in general. (By in general, I mean for all file systems.) But this will require a new getdirentries(2) syscall and libc functions that know how to use it. rick > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101073752.26759547.1430164988301.JavaMail.root>