Date: Wed, 15 Mar 2017 14:50:46 +0000 From: "Osipov, Michael" <michael.osipov@siemens.com> To: "freebsd-java@freebsd.org" <freebsd-java@freebsd.org> Subject: Needle in the haystack: same Java code on two identical machines, one passes one fails Message-ID: <68644224DA0DE64CA5A49838ED219A0425C0F0EA@DEFTHW99EJ5MSX.ww902.siemens.net>
next in thread | raw e-mail | index | archive | help
Hi folks, I am currently experiencing a stdio issue on a FreeBSD 10.3-STABLE box at work where another, identical box, works flawlessly as well as other test boxes in a VM or on real hardware from 9.3-STABLE to 11.0-STABLE. Let's stick to the two identical boxes at work for now, both are two identical HPE servers (Xeon CPUs, RAM 4 GiB) running 10.3-STABLE, both base systems are configured the same way. Software from ports installed is slightly different. faulty box: FreeBSD blnn719x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r310805: Fri Dec 30 11:29:53 CET 2016 root@blnn719x.ww004.siemens.net:/usr/obj/usr/src/sys/BLNN719X i386 working box: FreeBSD blnn714x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r310632: Tue Dec 27 18:58:32 CET 2016 root@blnn714x.ww004.siemens.net:/usr/obj/usr/src/sys/BLNN714X i386 The code I run is publically available, Maven Surefire (branch 2.19.2-experimental contains extended log output) which is the testing framework used throughout the entire Maven ecosystem: https://git-wip-us.apache.org/repos/asf/maven-surefire.git Run with: mvn -B -V clean install -Drat.skip -Dcheckstyle.skip | tee ~/maven-surefire.log Both machines have Maven 3.5.0-alpha-1 and OpenJDK 8 Update 121 from ports. Tests run off local disks on a gvinum backend which runs on top of an HP RAID5 system. A few specific tests fail namely where a parent Java process forks another Java process (not thread) and communicates bidrectionally through stdio. The failures manifest in the parent process assuming the forked process to be gone although the forked process already notified via stdio that it has completed all tasks and is performing a clean exit. This does not happen on blnn714x, really weird! I don't expect anyone to solve my problem here, but merely provide pointers where I can start looking what is really wrong with the machine blnn719x compared to the other one because I am searching the needle in the haystack. I am also quite certain that this is not a bug in the client code because it should fail on both machines as well as on the VMs I have at home and especially my old Pentium 4 box running FreeBSD 11-STABLE with even less memory if it would be a client code issue. I am convinced that this is some shared memory, buffers, caches issue. I have uploaded two tarballs for both machines: http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-experimental-blnn714x.tar.gz http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-experimental-blnn719x.tar.gz Each tarball contains: * log output of Maven with the failed tests, see the very end, e.g., Tests in error: testForkCountTwoNoReuse(org.apache.maven.surefire.its.ForkModeIT): Exit code was non-zero: 1; command line and log = (..) * surefire-integration-tests/target/<testName>/log.txt Contains verbose traces of the communication between parent and children I'd appreciate any type of help! Best regards, Michael PS: I have tested the code also on Windows, Ubuntu, RHEL6, Fedora 25 successfully
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?68644224DA0DE64CA5A49838ED219A0425C0F0EA>
