Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 2008 06:36:43 -0800
From:      David Wolfskill <david@catwhisker.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        hackers@freebsd.org, current@freebsd.org
Subject:   Re: NFS (& amd?) dysfunction descending a hierarchy
Message-ID:  <20081212143643.GE5597@albert.catwhisker.org>
In-Reply-To: <20081212134129.GD2038@deviant.kiev.zoral.com.ua>
References:  <20081203001538.GC96383@bunrab.catwhisker.org> <20081209190110.GW60731@albert.catwhisker.org> <Pine.GSO.4.63.0812101124430.24743@muncher.cs.uoguelph.ca> <20081210165022.GJ60731@albert.catwhisker.org> <20081210170620.GS2038@deviant.kiev.zoral.com.ua> <20081211225349.GB5597@albert.catwhisker.org> <20081212134129.GD2038@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--0H629O+sVkh21xTi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 12, 2008 at 03:41:29PM +0200, Kostik Belousov wrote:
> ...
> > * At 1229033597.287187 it issues an fstatfs() against FD 4; the
> >   unsuccessful return is at 1229033597.287195, claiming ENOENT.
> >=20
> > Say WHAT??!?
> ...
>=20
> But is this error transient or permanent ? I.e., would restart of rm
> successful or failing ?

In a test yesterday, it took 3 attempts (each attempt being an
invocations of "rm -fr ~bspace/ports") to actually complete removal of
the hierarchy.

Please note that:

* Done on a locally-mounted file systen (vs. NFS), a single invocation
  is sufficient and terminates normally.  Each of the above-cited
  attempts but the last terminated with a status code of 1 (as well as
  a whine that one or more subdirectories was not empty -- this, as a
  result of "rm" getting inconsistent information about the status of the
  file system).

* Done on either a locally- or NFS-mounted file system in FreeBSD 6.x, a
  single invocation is sufficient and terminates normally.

In other words, this is a regression.

> Anyway, this error looks different too.

?  From the earlier-posted results in 7.x?  Not that I can tell.  In
each case, the amd(8) child process is forked to attempt an unmount(),
tries it, gets EBUSY, and exits.  Meanwhile, rm(1) is descending a
directory tree.  It had performed a readdir(), and had been unlinking
files and performing rmdir() against empty subdirectories.  It
encounters an entry, issues stat(), finds that it's a subdirectory,
open()s it, gets an FD, issues fstat(), gets results that match those of
the earlier stat(), issues fcntl() against the FD (which returns 0),
tries to issue fstatfs() against the FD *that is still open*, and gets
told ENOENT.

It does differ from the behavior in 8-CURRENT, in that the amd(8) child
process in 8-CURRENT does not appear to get EBUSY.  The behavior from
rm(1)'s perspective is very similar, though.

If it would help, I could try getting a ktrace from a 6.x system, but I
expect it will be very boring: the amd(8) child process should get EBUSY
(as it does in 7.x), and nothing else should happen, since the unmount()
attempt failed.  And since it failed, rm(1) doesn't get told
inconsistent information, so things Just Work.

I admit that I'm no expert on VFS or much of the rest of the kernel,
for that matter.  But what I have observed happening in recent 7.x
is both wrong and a regression.

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--0H629O+sVkh21xTi
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAklCdvkACgkQmprOCmdXAD0uAwCeOCN2mO3bpUGorAOu2wCSLxlY
mgkAoIBbaJTfCWkCNclH+N2ADyZRPrOp
=Bmdx
-----END PGP SIGNATURE-----

--0H629O+sVkh21xTi--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081212143643.GE5597>