Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Jul 2012 00:43:01 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Andreas Tobler <andreast@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r237367 - head/sys/fs/nfsclient
Message-ID:  <20120701214301.GQ2337@deviant.kiev.zoral.com.ua>
In-Reply-To: <4FF097E5.8030909@FreeBSD.org>
References:  <201206210926.q5L9Q6nR002030@svn.freebsd.org> <4FF03316.5050609@FreeBSD.org> <20120701120408.GM2337@deviant.kiev.zoral.com.ua> <4FF0528E.50002@FreeBSD.org> <20120701134132.GO2337@deviant.kiev.zoral.com.ua> <4FF05724.3050904@FreeBSD.org> <20120701170543.GP2337@deviant.kiev.zoral.com.ua> <4FF097E5.8030909@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--pZGq8xo7gUAgOQb5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 01, 2012 at 08:33:09PM +0200, Andreas Tobler wrote:
> On 01.07.12 19:05, Konstantin Belousov wrote:
> >On Sun, Jul 01, 2012 at 03:56:52PM +0200, Andreas Tobler wrote:
> >>On 01.07.12 15:41, Konstantin Belousov wrote:
> >>>On Sun, Jul 01, 2012 at 03:37:18PM +0200, Andreas Tobler wrote:
> >>>>On 01.07.12 14:04, Konstantin Belousov wrote:
> >>>>>On Sun, Jul 01, 2012 at 01:23:02PM +0200, Andreas Tobler wrote:
> >>>>>>On 21.06.12 11:26, Konstantin Belousov wrote:
> >>>>>>>Author: kib
> >>>>>>>Date: Thu Jun 21 09:26:06 2012
> >>>>>>>New Revision: 237367
> >>>>>>>URL: http://svn.freebsd.org/changeset/base/237367
> >>>>>>>
> >>>>>>>Log:
> >>>>>>>   Enable deadlock avoidance code for NFS client.
> >>>>>>
> >>>>>>
> >>>>>>Hm, since this commit I fail with my nfs installworld/kernel.
> >>>>>>
> >>>>>>I have a builder which installs world/kernel to a nfs mounted=20
> >>>>>>directory.
> >>>>>>Namely used for cross builds.
> >>>>>>
> >>>>>>Now since this commit I get the following when I install kernel to =
the
> >>>>>>nfs directory:
> >>>>>>
> >>>>>>..
> >>>>>>install -o root -g wheel -m 555   zfs.ko.symbols
> >>>>>>/netboot/sparc64/boot/kernel
> >>>>>>install: /netboot/sparc64/boot/kernel/zfs.ko.symbols: No such file =
or
> >>>>>>directory
> >>>>>>*** [_kmodinstall] Error code 71
> >>>>>>..
> >>>>>>
> >>>>>>The file is there, a local install of the tree works without proble=
ms.
> >>>>>>Reverting to r237366 also makes it work again.
> >>>>>>
> >>>>>>The server is a -CURRENT, r237880, The client, -CURRENT too.
> >>>>>>
> >>>>>>How can I help to track down the real issue?
> >>>>>
> >>>>>Is it always the same file in the install procedure which causes the
> >>>>>failure ? Even more, is the failure pattern always the same ?
> >>>>
> >>>>I'd say so yes. When installing a kernel onto a nfs mounted fs then
> >>>>always (in my cases) the zfs.ko.symbols was the failing pattern.
> >>>>I tried ppc64 and sparc64 as target. With both it was the above file.
> >>>>
> >>>>When doing a installworld, it was, also in both cases, ppc64/sparc64,
> >>>>the cc1 in libexec which failed.
> >>>>
> >>>>>Might be, start with ktrace-ing the whole make invocation, including
> >>>>>the children processes.
> >>>>
> >>>>Some recipes how to start?
> >>>ktrace -o <file on local fs> -i make installkernel
> >>>Then kdump and cut the lines around relevant failure.
> >>
> >>ktrace -f, right?
> >Right, but without -i it is useless.
>=20
> Ah, yes, seems clear now after reading the man page.
>=20
> >>I placed the whole kdump here:
> >>
> >>http://people.freebsd.org/~andreast/dumped_installkernel.log
> >>
> >>It is not clear to me where the failure starts :)
> >Because logs do not contain tracepoints from the children.
> >See above about -i.
> >
> >I asked about excerpt because I expect the proper log to have an order
> >of magnitude bigger size.
>=20
> Ok. The dump is around 100MB, I hope I extracted as much as needed:
>=20
> http://people.freebsd.org/~andreast/dumped_installkernel-7.log
>=20
> >>>>>I used buildworld on the NFS-mounted obj/ as the test for the change=
s.
> >>>>
> >>>>Here the obj is local, only the src and the destination is on the
> >>>>nfs/netboot server.
> >>>
> >>>I just finished build on NFS obj/ and did several rounds of installs
> >>>for world and kernel into nfs-mounted destdir. It seems I cannot=20
> >>>reproduce
> >>>this locally.
> >>
> >>Ok. I try with an nfs obj too.
>=20
> So, I was not able to reproduce the failure with an nfs mounted obj dir.
>=20
> But I was able to reproduce the failure with three different machines=20
> which all have the obj local and the destination mounted via nfs.
>=20
> Are you able to try with a local obj too?
Below are two patches. Please follow my instructions literally to get
most of your bug report.

First, please apply the usr.bin/xinstall patch only, and retry installkernel
(no need to use ktrace). It should show the proper error, short write, with
zero-sized result, instead of garbage ENOENT from errno.

Next, please apply the sys/fs/nfsclient patch, which should fix the core
cause.

diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
index 71286e3..f7af6fb 100644
--- a/sys/fs/nfsclient/nfs_clbio.c
+++ b/sys/fs/nfsclient/nfs_clbio.c
@@ -897,7 +897,7 @@ ncl_write(struct vop_write_args *ap)
 	struct nfsmount *nmp =3D VFSTONFS(vp->v_mount);
 	daddr_t lbn;
 	int bcount;
-	int bp_cached, n, on, error =3D 0;
+	int bp_cached, n, on, error =3D 0, error1;
 	size_t orig_resid, local_resid;
 	off_t orig_size, tmp_off;
=20
@@ -1259,9 +1259,12 @@ again:
 		if ((ioflag & IO_SYNC)) {
 			if (ioflag & IO_INVAL)
 				bp->b_flags |=3D B_NOCACHE;
-			error =3D bwrite(bp);
-			if (error)
+			error1 =3D bwrite(bp);
+			if (error1 !=3D 0) {
+				if (error =3D=3D 0)
+					error =3D error1;
 				break;
+			}
 		} else if ((n + on) =3D=3D biosize) {
 			bp->b_flags |=3D B_ASYNC;
 			(void) ncl_writebp(bp, 0, NULL);


diff --git a/usr.bin/xinstall/xinstall.c b/usr.bin/xinstall/xinstall.c
index a920f85..3eba4f7 100644
--- a/usr.bin/xinstall/xinstall.c
+++ b/usr.bin/xinstall/xinstall.c
@@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$");
 #include <errno.h>
 #include <fcntl.h>
 #include <grp.h>
+#include <inttypes.h>
 #include <paths.h>
 #include <pwd.h>
 #include <stdio.h>
@@ -671,11 +672,18 @@ copy(int from_fd, const char *from_name, int to_fd, c=
onst char *to_name,
 	if (size <=3D 8 * 1048576 && trymmap(from_fd) &&
 	    (p =3D mmap(NULL, (size_t)size, PROT_READ, MAP_SHARED,
 		    from_fd, (off_t)0)) !=3D (char *)MAP_FAILED) {
-		if ((nw =3D write(to_fd, p, size)) !=3D size) {
+		nw =3D write(to_fd, p, size);
+		if (nw !=3D size) {
 			serrno =3D errno;
 			(void)unlink(to_name);
-			errno =3D nw > 0 ? EIO : serrno;
-			err(EX_OSERR, "%s", to_name);
+			if (nw >=3D 0) {
+				errx(EX_OSERR,
+     "short write to %s: %jd bytes written, %jd bytes asked to write",
+				    to_name, (uintmax_t)nw, (uintmax_t)size);
+			} else {
+				errno =3D serrno;
+				err(EX_OSERR, "%s", to_name);
+			}
 		}
 		done_copy =3D 1;
 	}
@@ -684,8 +692,15 @@ copy(int from_fd, const char *from_name, int to_fd, co=
nst char *to_name,
 			if ((nw =3D write(to_fd, buf, nr)) !=3D nr) {
 				serrno =3D errno;
 				(void)unlink(to_name);
-				errno =3D nw > 0 ? EIO : serrno;
-				err(EX_OSERR, "%s", to_name);
+				if (nw >=3D 0) {
+					errx(EX_OSERR,
+     "short write to %s: %jd bytes written, %jd bytes asked to write",
+					    to_name, (uintmax_t)nw,
+					    (uintmax_t)size);
+				} else {
+					errno =3D serrno;
+					err(EX_OSERR, "%s", to_name);
+				}
 			}
 		if (nr !=3D 0) {
 			serrno =3D errno;

--pZGq8xo7gUAgOQb5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/wxGUACgkQC3+MBN1Mb4jp9QCgu2hlRy+3BKQb2ADZnRCzpBPL
CLYAoM7c4jnQNMKAzfkTeAtZXvWfAJbc
=tF1M
-----END PGP SIGNATURE-----

--pZGq8xo7gUAgOQb5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120701214301.GQ2337>