Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Nov 2006 13:58:42 +0300
From:      Yar Tikhiy <yar@comp.chem.msu.su>
To:        freebsd-arch@freebsd.org
Subject:   Limitations in fts(3)
Message-ID:  <20061121105842.GD13289@comp.chem.msu.su>

next in thread | raw e-mail | index | archive | help

--SUOF0GtieIMvvwua
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi folks,

Our FTSENT appears to have unjustified limits as it uses u_short
for fts_pathlen and short for fts_level.  Consequently, our fts(3)
fails miserably on deep trees although there are plenty of resources
(memory) available to it--see PR bin/104458.  (Traversing unlimited
trees using limited resources is a different topic.)  Note that the
fts_pathlen range is not limited by PATH_MAX because fts(3) constructs
long paths by itself.

To begin with, paths longer than PATH_MAX are quite useless because
they can't be passed to syscalls.  Bruce Evans proposed that we
could have a new flag FTS_NOPATH to disable generating full pathnames
at all.  It can be good if used without FTS_NOCHDIR, as fts(3) will
take the process to the proper directory so that it can access its
entries by their relative pathnames.  This way can relieve us of
the fts_pathlen limitation without touching the structures from
fts.h.

However, there still is the fts_level limit.  This field begs to
be extended to int64_t, as we face all those yottabyte file systems.
(But at least it could be int to match FTW.level from ftw(3).)

NetBSD folks have already extended fts_pathlen and fts_namelen to
u_int, but they forgot about fts_level.  They also extended fts_number
to int64_t.  It looks like time for us to change our fts.h, too.
Symbol versioning now even allows us to have both new and old fts(3)
in the same libc if we want to.

Here's a prototype patch extending fts fields.  I tested it on i386,
and it just worked.  Of course, I'll test it on other arches if we
decide this is the way to go.

The string/array length fields become size_t because it's the only
proper type for them.  Note that the patch also extends flag fields
to u_int in order to get rid of short types in fts.h entirely, since
on modern arches they buy us neither memory economy nor computing
speed.  It's also a chance to extend fts_number to int64_t.  According
to Google Code, fts_bignum isn't used outside our src tree, so we
can eventually drop it in favour of the extended fts_number.

What do you think about all this?
Thanks!

-- 
Yar

--SUOF0GtieIMvvwua
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=fts-patch

--- include/fts.h.orig	Fri Jan  7 03:06:20 2005
+++ include/fts.h	Mon Nov 20 17:04:48 2006
@@ -44,8 +44,8 @@
 	dev_t fts_dev;			/* starting device # */
 	char *fts_path;			/* path for this descent */
 	int fts_rfd;			/* fd for root */
-	int fts_pathlen;		/* sizeof(path) */
-	int fts_nitems;			/* elements in the sort array */
+	size_t fts_pathlen;		/* sizeof(path) */
+	size_t fts_nitems;		/* elements in the sort array */
 	int (*fts_compar)		/* compare function */
 	    (const struct _ftsent * const *, const struct _ftsent * const *);
 
@@ -69,22 +69,15 @@
 	struct _ftsent *fts_cycle;	/* cycle node */
 	struct _ftsent *fts_parent;	/* parent directory */
 	struct _ftsent *fts_link;	/* next file in directory */
-	union {
-		struct {
-			long __fts_number;	/* local numeric value */
-			void *__fts_pointer;	/* local address value */
-		} __struct_ftsent;
-		int64_t __fts_bignum;
-	} __union_ftsent;
-#define	fts_number	__union_ftsent.__struct_ftsent.__fts_number
-#define	fts_pointer	__union_ftsent.__struct_ftsent.__fts_pointer
-#define	fts_bignum	__union_ftsent.__fts_bignum
+	int64_t fts_number;		/* local numeric value */
+#define	fts_bignum	fts_number	/* XXX non-std, should go away */
+	void *fts_pointer;		/* local address value */
 	char *fts_accpath;		/* access path */
 	char *fts_path;			/* root path */
 	int fts_errno;			/* errno for this node */
 	int fts_symfd;			/* fd for symlink */
-	u_short fts_pathlen;		/* strlen(fts_path) */
-	u_short fts_namelen;		/* strlen(fts_name) */
+	size_t fts_pathlen;		/* strlen(fts_path) */
+	size_t fts_namelen;		/* strlen(fts_name) */
 
 	ino_t fts_ino;			/* inode */
 	dev_t fts_dev;			/* device */
@@ -92,7 +85,7 @@
 
 #define	FTS_ROOTPARENTLEVEL	-1
 #define	FTS_ROOTLEVEL		 0
-	short fts_level;		/* depth (-1 to N) */
+	int64_t fts_level;		/* depth (-1 to N) */
 
 #define	FTS_D		 1		/* preorder directory */
 #define	FTS_DC		 2		/* directory that causes cycles */
@@ -108,18 +101,18 @@
 #define	FTS_SL		12		/* symbolic link */
 #define	FTS_SLNONE	13		/* symbolic link without target */
 #define	FTS_W		14		/* whiteout object */
-	u_short fts_info;		/* user flags for FTSENT structure */
+	u_int fts_info;			/* user flags for FTSENT structure */
 
 #define	FTS_DONTCHDIR	 0x01		/* don't chdir .. to the parent */
 #define	FTS_SYMFOLLOW	 0x02		/* followed a symlink to get here */
 #define	FTS_ISW		 0x04		/* this is a whiteout object */
-	u_short fts_flags;		/* private flags for FTSENT structure */
+	u_int fts_flags;		/* private flags for FTSENT structure */
 
 #define	FTS_AGAIN	 1		/* read node again */
 #define	FTS_FOLLOW	 2		/* follow symbolic link */
 #define	FTS_NOINSTR	 3		/* no instructions */
 #define	FTS_SKIP	 4		/* discard node */
-	u_short fts_instr;		/* fts_set() instructions */
+	u_int fts_instr;		/* fts_set() instructions */
 
 	struct stat *fts_statp;		/* stat(2) information */
 	char *fts_name;			/* file name */
--- lib/libc/gen/fts.c.orig	Tue Jun  8 10:23:23 2004
+++ lib/libc/gen/fts.c	Mon Nov 20 17:17:39 2006
@@ -56,15 +56,15 @@
 #include <unistd.h>
 #include "un-namespace.h"
 
-static FTSENT	*fts_alloc(FTS *, char *, int);
+static FTSENT	*fts_alloc(FTS *, char *, size_t);
 static FTSENT	*fts_build(FTS *, int);
 static void	 fts_lfree(FTSENT *);
 static void	 fts_load(FTS *, FTSENT *);
 static size_t	 fts_maxarglen(char * const *);
 static void	 fts_padjust(FTS *, FTSENT *);
 static int	 fts_palloc(FTS *, size_t);
-static FTSENT	*fts_sort(FTS *, FTSENT *, int);
-static u_short	 fts_stat(FTS *, FTSENT *, int);
+static FTSENT	*fts_sort(FTS *, FTSENT *, size_t);
+static u_int	 fts_stat(FTS *, FTSENT *, int);
 static int	 fts_safe_changedir(FTS *, FTSENT *, int, char *);
 static int	 fts_ufslinks(FTS *, const FTSENT *);
 
@@ -119,9 +119,8 @@
 	struct _fts_private *priv;
 	FTS *sp;
 	FTSENT *p, *root;
-	int nitems;
 	FTSENT *parent, *tmp;
-	int len;
+	size_t len, nitems;
 
 	/* Options check. */
 	if (options & ~FTS_OPTIONMASK) {
@@ -228,7 +227,7 @@
 	FTS *sp;
 	FTSENT *p;
 {
-	int len;
+	size_t len;
 	char *cp;
 
 	/*
@@ -630,12 +629,12 @@
 {
 	struct dirent *dp;
 	FTSENT *p, *head;
-	int nitems;
 	FTSENT *cur, *tail;
 	DIR *dirp;
 	void *oldaddr;
-	size_t dnamlen;
-	int cderrno, descend, len, level, maxlen, nlinks, oflag, saved_errno,
+	size_t dnamlen, len, maxlen, nitems;
+	int64_t level;
+	int cderrno, descend, nlinks, oflag, saved_errno,
 	    nostat, doadjust;
 	char *cp;
 
@@ -745,7 +744,7 @@
 		if (!ISSET(FTS_SEEDOT) && ISDOT(dp->d_name))
 			continue;
 
-		if ((p = fts_alloc(sp, dp->d_name, (int)dnamlen)) == NULL)
+		if ((p = fts_alloc(sp, dp->d_name, dnamlen)) == NULL)
 			goto mem1;
 		if (dnamlen >= maxlen) {	/* include space for NUL */
 			oldaddr = sp->fts_path;
@@ -774,21 +773,6 @@
 			maxlen = sp->fts_pathlen - len;
 		}
 
-		if (len + dnamlen >= USHRT_MAX) {
-			/*
-			 * In an FTSENT, fts_pathlen is a u_short so it is
-			 * possible to wraparound here.  If we do, free up
-			 * the current structure and the structures already
-			 * allocated, then error out with ENAMETOOLONG.
-			 */
-			free(p);
-			fts_lfree(head);
-			(void)closedir(dirp);
-			cur->fts_info = FTS_ERR;
-			SET(FTS_STOP);
-			errno = ENAMETOOLONG;
-			return (NULL);
-		}
 		p->fts_level = level;
 		p->fts_parent = sp->fts_cur;
 		p->fts_pathlen = len + dnamlen;
@@ -889,7 +873,7 @@
 	return (head);
 }
 
-static u_short
+static u_int
 fts_stat(sp, p, follow)
 	FTS *sp;
 	FTSENT *p;
@@ -991,7 +975,7 @@
 fts_sort(sp, head, nitems)
 	FTS *sp;
 	FTSENT *head;
-	int nitems;
+	size_t nitems;
 {
 	FTSENT **ap, *p;
 
@@ -1023,7 +1007,7 @@
 fts_alloc(sp, name, namelen)
 	FTS *sp;
 	char *name;
-	int namelen;
+	size_t namelen;
 {
 	FTSENT *p;
 	size_t len;
@@ -1095,18 +1079,6 @@
 {
 
 	sp->fts_pathlen += more + 256;
-	/*
-	 * Check for possible wraparound.  In an FTS, fts_pathlen is
-	 * a signed int but in an FTSENT it is an unsigned short.
-	 * We limit fts_pathlen to USHRT_MAX to be safe in both cases.
-	 */
-	if (sp->fts_pathlen < 0 || sp->fts_pathlen >= USHRT_MAX) {
-		if (sp->fts_path)
-			free(sp->fts_path);
-		sp->fts_path = NULL;
-		errno = ENAMETOOLONG;
-		return (1);
-	}
 	sp->fts_path = reallocf(sp->fts_path, sp->fts_pathlen);
 	return (sp->fts_path == NULL);
 }

--SUOF0GtieIMvvwua--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061121105842.GD13289>