Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Mar 2017 14:50:46 +0000
From:      "Osipov, Michael" <michael.osipov@siemens.com>
To:        "freebsd-java@freebsd.org" <freebsd-java@freebsd.org>
Subject:   Needle in the haystack: same Java code on two identical machines, one passes one fails
Message-ID:  <68644224DA0DE64CA5A49838ED219A0425C0F0EA@DEFTHW99EJ5MSX.ww902.siemens.net>

next in thread | raw e-mail | index | archive | help
Hi folks,

I am currently experiencing a stdio issue on a FreeBSD 10.3-STABLE box at w=
ork where
another, identical box, works flawlessly as well as other test boxes in a V=
M or on real
hardware from 9.3-STABLE to 11.0-STABLE.

Let's stick to the two identical boxes at work for now, both are two identi=
cal HPE
servers (Xeon CPUs, RAM 4 GiB) running 10.3-STABLE, both base systems are
configured the same way. Software from ports installed is slightly differen=
t.

faulty box:
FreeBSD blnn719x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3108=
05: Fri Dec 30 11:29:53 CET 2016     root@blnn719x.ww004.siemens.net:/usr/o=
bj/usr/src/sys/BLNN719X  i386

working box:
FreeBSD blnn714x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3106=
32: Tue Dec 27 18:58:32 CET 2016     root@blnn714x.ww004.siemens.net:/usr/o=
bj/usr/src/sys/BLNN714X  i386

The code I run is publically available, Maven Surefire (branch 2.19.2-exper=
imental
contains extended log output) which is the testing framework used throughou=
t
the entire Maven ecosystem:
https://git-wip-us.apache.org/repos/asf/maven-surefire.git

Run with:
mvn -B -V clean install -Drat.skip -Dcheckstyle.skip | tee ~/maven-surefire=
.log

Both machines have Maven 3.5.0-alpha-1 and OpenJDK 8 Update 121 from ports.
Tests run off local disks on a gvinum backend which runs on top of an HP RA=
ID5
system.

A few specific tests fail namely where a parent Java process forks another =
Java
process (not thread) and communicates bidrectionally through stdio. The fai=
lures
manifest in the parent process assuming the forked process to be gone altho=
ugh
the forked process already notified via stdio that it has completed all tas=
ks
and is performing a clean exit. This does not happen on blnn714x, really we=
ird!

I don't expect anyone to solve my problem here, but merely provide pointers
where I can start looking what is really wrong with the machine blnn719x co=
mpared
to the other one because I am searching the needle in the haystack.
I am also quite certain that this is not a bug in the client code because i=
t
should fail on both machines as well as on the VMs I have at home and
especially my old Pentium 4 box running FreeBSD 11-STABLE with even less me=
mory
if it would be a client code issue.
I am convinced that this is some shared memory, buffers, caches issue.

I have uploaded two tarballs for both machines:
http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper=
imental-blnn714x.tar.gz
http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper=
imental-blnn719x.tar.gz


Each tarball contains:

* log output of Maven with the failed tests, see the very end, e.g.,
Tests in error:=20
  testForkCountTwoNoReuse(org.apache.maven.surefire.its.ForkModeIT): Exit c=
ode was non-zero: 1; command line and log =3D (..)
* surefire-integration-tests/target/<testName>/log.txt
  Contains verbose traces of the communication between parent and children

I'd appreciate any type of help!

Best regards,

Michael

PS: I have tested the code also on Windows, Ubuntu, RHEL6, Fedora 25 succes=
sfully



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?68644224DA0DE64CA5A49838ED219A0425C0F0EA>