From owner-freebsd-current@FreeBSD.ORG Sat Apr 25 21:15:27 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6D36343D; Sat, 25 Apr 2015 21:15:27 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id CC9AD1BB0; Sat, 25 Apr 2015 21:15:26 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CsBACkAzxV/95baINCGoNfXAWDFcNDgUgKhTZOAoFrEwEBAQEBAQGBCoQhAQEEAQEBIAQnIAsbGAICDRkCKQEJJgYIBwQBHASICg04sgCUDwEBAQEBAQQBAQEBAQEBAQEZgSGKF4QzAQEcNAeCaIFFBZVWhAiDUj2Fe44HI4IGHYFtIjEBBoEEOYEAAQEB X-IronPort-AV: E=Sophos;i="5.11,647,1422939600"; d="scan'208";a="206151471" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Apr 2015 17:15:25 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CA60FB3F08; Sat, 25 Apr 2015 17:15:24 -0400 (EDT) Date: Sat, 25 Apr 2015 17:15:24 -0400 (EDT) From: Rick Macklem To: Julian Elischer Cc: freebsd-current@freebsd.org, John Baldwin , Jilles Tjoelker Message-ID: <1302099206.25780241.1429996524816.JavaMail.root@uoguelph.ca> In-Reply-To: <553B0326.1090306@freebsd.org> Subject: Re: readdir/telldir/seekdir problem (i think) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 21:15:27 -0000 Julian Elischer wrote: > On 4/25/15 9:39 AM, Rick Macklem wrote: > > Jilles Tjoelker wrote: > >> On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote: > >>> Yes, this isn't at all safe. There's no guarantee whatsoever > >>> that > >>> the offset on the directory fd that isn't something returned by > >>> getdirentries has any meaning. In particular, the size of the > >>> directory entry in a random filesystem might be a different size > >>> than the structure returned by getdirentries (since it converts > >>> things into a FS-independent format). > >>> This might work for UFS by accident, but this is probably why ZFS > >>> doesn't work. > >>> However, this might be properly fixed by the thing that ino64 is > >>> doing where each directory entry returned by getdirentries gives > >>> you a seek offset that you _can_ directly seek to (as opposed to > >>> seeking to the start of the block and then walking forward N > >>> entries until you get an inter-block entry that is the same). > >> The ino64 branch only reserves space for d_off and does not use it > >> in > >> any way. This is appropriate since actually using d_off is a major > >> feature addition. > >> > > Well, at some point ino64 will need to define a new > > getdirentries(2) > > syscall and I believe this new syscall can have > > different/additional > > arguments. > yes, posix only specifies 2 mandatory fields (d_ino and d_name) and > everything else is implementation dependent. > > I'd suggest that the new gtedirentries(2) syscall should return a > > flag to indicate that the underlying file system is filling in > > d_off. > > Then the libc functions can use d_off if it it available. > > (They will still need to "work" at least as well as they do now if > > the file system doesn't support d_off. The old getdirentries(2) > > syscall > > will be returning the old/current "struct dirent" which doesn't > > have > > the field anyhow.) > > > > Another bit of fun is that the argument for seekdir()/telldir() is > > a > > long and ends up 32bits for some arches. d_off is 64bits, since > > that > > is what some file systems require. > what does linux use? Btw, I found this: https://bugs.centos.org/view.php?id=5496 It appears that Linux has been having fun with this too, at least for NFS. I still think that reading the whole directory is the only way to fix NFS. (Unfortunately, they don't say how the Linux distros fixed it.;-) Have fun with it, rick > ------ > In glibc up to version 2.1.1, the return type of telldir() was > off_t. > POSIX.1-2001 specifies long, and this is the type used since > glibc > 2.1.2. > > also from the linux man page: this is interesting.. > > -------- > In early filesystems, the value returned by telldir() was a > simple > file offset within a directory. Modern filesystems use tree > or hash > structures, rather than flat tables, to represent > directories. On > such filesystems, the value returned by telldir() (and used > internally by readdir(3)) is a "cookie" that is used by the > implementation to derive a position within a directory. > Application > programs should treat this strictly as an opaque value, > making no > assumptions about its contents. > ------ > but glibc uses the contents in a nonopaque (and possibly wrong) way > itself in seekdir. . > (not following their own advice.) > > > > Maybe the library code can only use d_off if it is a 64bit arch and > > the file system is filling it in. (Or maybe the library can keep > > track > > of 32<->64bit mappings for the offsets. I haven't looked at the > > libc > > functions for a while, so I can't remember what they keep track > > of.) > > one supposes a 32 bit system would not have such large file systems > on > it.. > (maybe?) > > > > rick > > > >> A proper d_off would still be useful even if UFS's readdir keeps > >> masking > >> off the offset so a directory read always starts at the beginning > >> of > >> a > >> 512-byte directory block, since this allows more distinct offset > >> values > >> than safely using getdirentries()'s *basep. With d_off, one outer > >> loop > >> must read at least one directory block to avoid spinning > >> indefinitely, > >> while using getdirentries()'s *basep requires reading the whole > >> getdirentries() buffer. > >> > >> Some Linux filesystems go further and provide a unique d_off for > >> each > >> entry. > >> > >> Another idea would be to store the last d_ino instead of dd_loc > >> into > >> the > >> struct ddloc. On seekdir(), this would seek to loc_seek as before > >> and > >> skip entries until that d_ino is found, or to the start of the > >> buffer > >> if > >> not found (and possibly return some entries again that should not > >> be > >> returned, but Samba copes with that). > >> > >> -- > >> Jilles Tjoelker > >> _______________________________________________ > >> freebsd-current@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >> To unsubscribe, send any mail to > >> "freebsd-current-unsubscribe@freebsd.org" > >> > > > >