From owner-svn-src-head@FreeBSD.ORG Wed Jul 9 14:34:19 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D7A15D2F; Wed, 9 Jul 2014 14:34:19 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 7FF7724DD; Wed, 9 Jul 2014 14:34:19 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 98C79D660A4; Thu, 10 Jul 2014 00:34:17 +1000 (EST) Date: Thu, 10 Jul 2014 00:34:17 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh Subject: Re: svn commit: r268376 - head/bin/rm In-Reply-To: <201407072321.s67NLL7p070713@svn.freebsd.org> Message-ID: <20140709233323.E2055@besplex.bde.org> References: <201407072321.s67NLL7p070713@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=faLKK8EkqBwA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=krcSLVLWv4mBT88GSpIA:9 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jul 2014 14:34:19 -0000 On Mon, 7 Jul 2014, Warner Losh wrote: > Log: > rm -rf can fail sometimes with an error from fts_read. Make it honor > fflag to ignore fts_read errors, but stop deleting from that directory > because no further progress can be made. > > When building a kernel with a high -j value on a high core count > machine, during the cleanobj phase we can wind up doing multiple rm > -rf at the same time for modules that have subdirectories. This > exposed this race (sometimes) as fts_read can return an error if the > directory is removed by another rm -rf. Since the intent of the -f > flag was to ignore errors, even if this was a bug in fts_read, we > should ignore the error like we've been instructed to do. This seems to be more broken than before. Although the intent of using -f is often to ignore errors, -f is supposed to only ignore ENOENT errors except in some cases for complicated interactions with -i. > Modified: head/bin/rm/rm.c > ============================================================================== > --- head/bin/rm/rm.c Mon Jul 7 23:21:15 2014 (r268375) > +++ head/bin/rm/rm.c Mon Jul 7 23:21:20 2014 (r268376) > @@ -335,7 +335,7 @@ err: > warn("%s", p->fts_path); > eval = 1; > } > - if (errno) > + if (!fflag && errno) > err(1, "fts_read"); > fts_close(fts); > } The old code was broken too. Apart from its style bug (boolean test for non-boolean), it tests errno in all cases after leaving the loop, but most of the loop exits are not for errors so the test is of the garbage value last written to errno. So the old and new code both defeat the more careful tests of fflag combined with errno the loop. Most are of the form (!fflag || p->fts_errno != ENOENT) (then at least a warning if this fails). It also seems wrong to use errno outside the loop and p->fts_errno in the loop. Of course, outside the loop, p is usually invalid, but that means that errno is very stale. % while ((p = fts_read(fts)) != NULL) { Normal loop exit is when this is NULL. Then there is no error, and the errno that is tested is pure garbage. Fixing this alone might fix the problem. Otherwise, hopefully any error from things going away causes an ENOENT error, so checking for that alone would fix the error handling. That's a bit much to ask for -- if a whole subtree is removed then there should be multiple ENOENT errors and probably other errors, and fts_read() would find it hard to reduce to a single ENOENT without missing a more serious error. % switch (p->fts_info) { % case FTS_DNR: % if (!fflag || p->fts_errno != ENOENT) { % warnx("%s: %s", % p->fts_path, strerror(p->fts_errno)); % eval = 1; % } % continue; Normal error handling. We check for ENOENT and handle the error completely and don't break out of the loop. % case FTS_ERR: % errx(1, "%s: %s", p->fts_path, strerror(p->fts_errno)); Exit for fatal errors. I think POSIX doesn't allow this, but requires contining to the next arg. Exiting the loop should give that, unless fts_close() fails. % case FTS_NS: % /* % * Assume that since fts_read() couldn't stat the % * file, it can't be unlinked. % */ % if (!needstat) % break; Here we exit the loop for an assumed stat() error. If this error is anything other than ENOENT, then POSIX doesn't allow ignoring it but it might allow quitting for the current arg. If this error is ENOENT, then the error handling seems to be even more broken. We shouldn't quit for the current arg on ENOENT, irrespective of fflag. Permissions 000 on a file don't prevent stat'ing it, and I couldn't find a simple example where this code is reached. % ... % case FTS_DP: % /* Post-order: see if user skipped. */ % if (p->fts_number == SKIPPED) % continue; % break; Most other cases don't exit the loop. % ... % } % if (!fflag && errno) More broken than before. % err(1, "fts_read"); fts without FTS_NOCHDIR is fundamentally broken if the directory tree can change underneath it, since (without more work by the caller) it is impossible to get back to the original directory in fts_close() or just up a level in the tree walk for small changes in the tree (the caller cannot handle this with any reasonable anount of work). FTS_NOCHDIR is only used by the following utilities in /usr/src/*bin: cp, pax, gxip, find, grep, asf. It is not used in chmod, rm, ls, chflags, du, chown, ckdist, ctm, mtree, kldxref, setfmac. Mixing this flag gives the worse possible combination -- e.g., rm can handle deeper trees by not using FTS_NOCHDIR, but cp cannot duplicate deep trees. POSIX requires handling unboundedly deep trees in bounded memory for at last critical utilities like rm. This is impossible, and not using FTS_NOCHDIR in FreeBSD rm doesn't give it. Bruce