From owner-freebsd-current@FreeBSD.ORG Sat Apr 25 03:00:01 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DE2C4193; Sat, 25 Apr 2015 03:00:01 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BC3CA13D9; Sat, 25 Apr 2015 03:00:01 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-241-118.lns20.per4.internode.on.net [121.45.241.118]) (authenticated bits=0) by vps1.elischer.org (8.14.9/8.14.9) with ESMTP id t3P2xti7029642 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Fri, 24 Apr 2015 19:59:58 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <553B0326.1090306@freebsd.org> Date: Sat, 25 Apr 2015 10:59:50 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Rick Macklem , Jilles Tjoelker CC: freebsd-current@freebsd.org, John Baldwin Subject: Re: readdir/telldir/seekdir problem (i think) References: <326462676.25571625.1429925971889.JavaMail.root@uoguelph.ca> In-Reply-To: <326462676.25571625.1429925971889.JavaMail.root@uoguelph.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Apr 2015 03:00:02 -0000 On 4/25/15 9:39 AM, Rick Macklem wrote: > Jilles Tjoelker wrote: >> On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote: >>> Yes, this isn't at all safe. There's no guarantee whatsoever that >>> the offset on the directory fd that isn't something returned by >>> getdirentries has any meaning. In particular, the size of the >>> directory entry in a random filesystem might be a different size >>> than the structure returned by getdirentries (since it converts >>> things into a FS-independent format). >>> This might work for UFS by accident, but this is probably why ZFS >>> doesn't work. >>> However, this might be properly fixed by the thing that ino64 is >>> doing where each directory entry returned by getdirentries gives >>> you a seek offset that you _can_ directly seek to (as opposed to >>> seeking to the start of the block and then walking forward N >>> entries until you get an inter-block entry that is the same). >> The ino64 branch only reserves space for d_off and does not use it in >> any way. This is appropriate since actually using d_off is a major >> feature addition. >> > Well, at some point ino64 will need to define a new getdirentries(2) > syscall and I believe this new syscall can have different/additional > arguments. yes, posix only specifies 2 mandatory fields (d_ino and d_name) and everything else is implementation dependent. > I'd suggest that the new gtedirentries(2) syscall should return a > flag to indicate that the underlying file system is filling in d_off. > Then the libc functions can use d_off if it it available. > (They will still need to "work" at least as well as they do now if > the file system doesn't support d_off. The old getdirentries(2) syscall > will be returning the old/current "struct dirent" which doesn't have > the field anyhow.) > > Another bit of fun is that the argument for seekdir()/telldir() is a > long and ends up 32bits for some arches. d_off is 64bits, since that > is what some file systems require. what does linux use? ------ In glibc up to version 2.1.1, the return type of telldir() was off_t. POSIX.1-2001 specifies long, and this is the type used since glibc 2.1.2. also from the linux man page: this is interesting.. -------- In early filesystems, the value returned by telldir() was a simple file offset within a directory. Modern filesystems use tree or hash structures, rather than flat tables, to represent directories. On such filesystems, the value returned by telldir() (and used internally by readdir(3)) is a "cookie" that is used by the implementation to derive a position within a directory. Application programs should treat this strictly as an opaque value, making no assumptions about its contents. ------ but glibc uses the contents in a nonopaque (and possibly wrong) way itself in seekdir. . (not following their own advice.) > Maybe the library code can only use d_off if it is a 64bit arch and > the file system is filling it in. (Or maybe the library can keep track > of 32<->64bit mappings for the offsets. I haven't looked at the libc > functions for a while, so I can't remember what they keep track of.) one supposes a 32 bit system would not have such large file systems on it.. (maybe?) > > rick > >> A proper d_off would still be useful even if UFS's readdir keeps >> masking >> off the offset so a directory read always starts at the beginning of >> a >> 512-byte directory block, since this allows more distinct offset >> values >> than safely using getdirentries()'s *basep. With d_off, one outer >> loop >> must read at least one directory block to avoid spinning >> indefinitely, >> while using getdirentries()'s *basep requires reading the whole >> getdirentries() buffer. >> >> Some Linux filesystems go further and provide a unique d_off for each >> entry. >> >> Another idea would be to store the last d_ino instead of dd_loc into >> the >> struct ddloc. On seekdir(), this would seek to loc_seek as before and >> skip entries until that d_ino is found, or to the start of the buffer >> if >> not found (and possibly return some entries again that should not be >> returned, but Samba copes with that). >> >> -- >> Jilles Tjoelker >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to >> "freebsd-current-unsubscribe@freebsd.org" >> >