From owner-freebsd-java@freebsd.org Wed Mar 15 14:57:43 2017 Return-Path: Delivered-To: freebsd-java@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B194FD0EBE2 for ; Wed, 15 Mar 2017 14:57:43 +0000 (UTC) (envelope-from michael.osipov@siemens.com) Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "thoth.sbs.de", Issuer "Siemens Issuing CA Class Internet Server 2013" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 41A821D28 for ; Wed, 15 Mar 2017 14:57:42 +0000 (UTC) (envelope-from michael.osipov@siemens.com) Received: from mail1.sbs.de (mail1.sbs.de [192.129.41.35]) by thoth.sbs.de (8.15.2/8.15.2) with ESMTPS id v2FEq4xO013376 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 15 Mar 2017 15:52:04 +0100 Received: from DEFTHW99ERLMSX.ww902.siemens.net (defthw99erlmsx.ww902.siemens.net [139.22.70.136]) by mail1.sbs.de (8.15.2/8.15.2) with ESMTPS id v2FEq2xq026776 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 15 Mar 2017 15:52:04 +0100 Received: from DENBGAT9ERMMSX.ww902.siemens.net (139.22.70.140) by DEFTHW99ERLMSX.ww902.siemens.net (139.22.70.136) with Microsoft SMTP Server (TLS) id 14.3.339.0; Wed, 15 Mar 2017 15:50:47 +0100 Received: from DEFTHW99EJ5MSX.ww902.siemens.net ([169.254.6.192]) by DENBGAT9ERMMSX.ww902.siemens.net ([139.22.70.140]) with mapi id 14.03.0339.000; Wed, 15 Mar 2017 15:50:46 +0100 From: "Osipov, Michael" To: "freebsd-java@freebsd.org" Subject: Needle in the haystack: same Java code on two identical machines, one passes one fails Thread-Topic: Needle in the haystack: same Java code on two identical machines, one passes one fails Thread-Index: AdKdm2q0eGsP3SuqQcO0wrNX1weRrA== Date: Wed, 15 Mar 2017 14:50:46 +0000 Message-ID: <68644224DA0DE64CA5A49838ED219A0425C0F0EA@DEFTHW99EJ5MSX.ww902.siemens.net> Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [139.22.70.21] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Mar 2017 14:57:43 -0000 Hi folks, I am currently experiencing a stdio issue on a FreeBSD 10.3-STABLE box at w= ork where another, identical box, works flawlessly as well as other test boxes in a V= M or on real hardware from 9.3-STABLE to 11.0-STABLE. Let's stick to the two identical boxes at work for now, both are two identi= cal HPE servers (Xeon CPUs, RAM 4 GiB) running 10.3-STABLE, both base systems are configured the same way. Software from ports installed is slightly differen= t. faulty box: FreeBSD blnn719x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3108= 05: Fri Dec 30 11:29:53 CET 2016 root@blnn719x.ww004.siemens.net:/usr/o= bj/usr/src/sys/BLNN719X i386 working box: FreeBSD blnn714x.ww004.siemens.net 10.3-STABLE FreeBSD 10.3-STABLE #0 r3106= 32: Tue Dec 27 18:58:32 CET 2016 root@blnn714x.ww004.siemens.net:/usr/o= bj/usr/src/sys/BLNN714X i386 The code I run is publically available, Maven Surefire (branch 2.19.2-exper= imental contains extended log output) which is the testing framework used throughou= t the entire Maven ecosystem: https://git-wip-us.apache.org/repos/asf/maven-surefire.git Run with: mvn -B -V clean install -Drat.skip -Dcheckstyle.skip | tee ~/maven-surefire= .log Both machines have Maven 3.5.0-alpha-1 and OpenJDK 8 Update 121 from ports. Tests run off local disks on a gvinum backend which runs on top of an HP RA= ID5 system. A few specific tests fail namely where a parent Java process forks another = Java process (not thread) and communicates bidrectionally through stdio. The fai= lures manifest in the parent process assuming the forked process to be gone altho= ugh the forked process already notified via stdio that it has completed all tas= ks and is performing a clean exit. This does not happen on blnn714x, really we= ird! I don't expect anyone to solve my problem here, but merely provide pointers where I can start looking what is really wrong with the machine blnn719x co= mpared to the other one because I am searching the needle in the haystack. I am also quite certain that this is not a bug in the client code because i= t should fail on both machines as well as on the VMs I have at home and especially my old Pentium 4 box running FreeBSD 11-STABLE with even less me= mory if it would be a client code issue. I am convinced that this is some shared memory, buffers, caches issue. I have uploaded two tarballs for both machines: http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper= imental-blnn714x.tar.gz http://home.apache.org/~michaelo/maven/surefire/maven-surefire-2.19.2-exper= imental-blnn719x.tar.gz Each tarball contains: * log output of Maven with the failed tests, see the very end, e.g., Tests in error:=20 testForkCountTwoNoReuse(org.apache.maven.surefire.its.ForkModeIT): Exit c= ode was non-zero: 1; command line and log =3D (..) * surefire-integration-tests/target//log.txt Contains verbose traces of the communication between parent and children I'd appreciate any type of help! Best regards, Michael PS: I have tested the code also on Windows, Ubuntu, RHEL6, Fedora 25 succes= sfully