Date: Wed, 15 Mar 2017 14:50:46 +0000 From: "Osipov, Michael" <michael.osipov@siemens.com> To: "freebsd-java@freebsd.org" <freebsd-java@freebsd.org> Subject: Needle in the haystack: same Java code on two identical machines, one passes one fails Message-ID: <68644224DA0DE64CA5A49838ED219A0425C0F0EA@DEFTHW99EJ5MSX.ww902.siemens.net>
next in thread | raw e-mail | index | archive | help
Hi folks, I am currently experiencing a stdio issue on a FreeBSD 10.3-STABLE box at w= ork where another, identical box, works flawlessly as well as other test boxes in a V= M or on real hardware from 9.3-STABLE to 11.0-STABLE. Let's stick to the two identical boxes at work for now, both are two identi= cal HPE servers (Xeon CPUs, RAM 4 GiB) running 10.3-STABLE, both base systems are configured the same way. Software from ports installed is slightly differen= t. faulty box: FreeBSD blnn719x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3108= 05: Fri Dec 30 11:29:53 CET 2016 root@blnn719x.ww004.siemens.net:/usr/o= bj/usr/src/sys/BLNN719X i386 working box: FreeBSD blnn714x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3106= 32: Tue Dec 27 18:58:32 CET 2016 root@blnn714x.ww004.siemens.net:/usr/o= bj/usr/src/sys/BLNN714X i386 The code I run is publically available, Maven Surefire (branch 2.19.2-exper= imental contains extended log output) which is the testing framework used throughou= t the entire Maven ecosystem: https://git-wip-us.apache.org/repos/asf/maven-surefire.git Run with: mvn -B -V clean install -Drat.skip -Dcheckstyle.skip | tee ~/maven-surefire= .log Both machines have Maven 3.5.0-alpha-1 and OpenJDK 8 Update 121 from ports. Tests run off local disks on a gvinum backend which runs on top of an HP RA= ID5 system. A few specific tests fail namely where a parent Java process forks another = Java process (not thread) and communicates bidrectionally through stdio. The fai= lures manifest in the parent process assuming the forked process to be gone altho= ugh the forked process already notified via stdio that it has completed all tas= ks and is performing a clean exit. This does not happen on blnn714x, really we= ird! I don't expect anyone to solve my problem here, but merely provide pointers where I can start looking what is really wrong with the machine blnn719x co= mpared to the other one because I am searching the needle in the haystack. I am also quite certain that this is not a bug in the client code because i= t should fail on both machines as well as on the VMs I have at home and especially my old Pentium 4 box running FreeBSD 11-STABLE with even less me= mory if it would be a client code issue. I am convinced that this is some shared memory, buffers, caches issue. I have uploaded two tarballs for both machines: http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper= imental-blnn714x.tar.gz http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper= imental-blnn719x.tar.gz Each tarball contains: * log output of Maven with the failed tests, see the very end, e.g., Tests in error:=20 testForkCountTwoNoReuse(org.apache.maven.surefire.its.ForkModeIT): Exit c= ode was non-zero: 1; command line and log =3D (..) * surefire-integration-tests/target/<testName>/log.txt Contains verbose traces of the communication between parent and children I'd appreciate any type of help! Best regards, Michael PS: I have tested the code also on Windows, Ubuntu, RHEL6, Fedora 25 succes= sfully
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?68644224DA0DE64CA5A49838ED219A0425C0F0EA>