From owner-freebsd-ports-bugs@freebsd.org Mon Sep 24 21:41:29 2018 Return-Path: Delivered-To: freebsd-ports-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7FE8B1092A14 for ; Mon, 24 Sep 2018 21:41:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 1CC2773F2A for ; Mon, 24 Sep 2018 21:41:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id D175F1092A13; Mon, 24 Sep 2018 21:41:28 +0000 (UTC) Delivered-To: ports-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9586F1092A12 for ; Mon, 24 Sep 2018 21:41:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 36A2F73F24 for ; Mon, 24 Sep 2018 21:41:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 8432A2683B for ; Mon, 24 Sep 2018 21:41:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w8OLfRX2007632 for ; Mon, 24 Sep 2018 21:41:27 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w8OLfRaa007631 for ports-bugs@FreeBSD.org; Mon, 24 Sep 2018 21:41:27 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 231697] net/openmpi2: MPI_Send to self fails (or receive from self fails?) Date: Mon, 24 Sep 2018 21:41:27 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: russo@bogodyn.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: danilo@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter flagtypes.name attachments.mimetype attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-ports-bugs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Ports bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Sep 2018 21:41:29 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D231697 Bug ID: 231697 Summary: net/openmpi2: MPI_Send to self fails (or receive from self fails?) Product: Ports & Packages Version: Latest Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: Individual Port(s) Assignee: danilo@FreeBSD.org Reporter: russo@bogodyn.org Flags: maintainer-feedback?(danilo@FreeBSD.org) Assignee: danilo@FreeBSD.org Attachment #197466 text/plain mime type: Created attachment 197466 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D197466&action= =3Dedit Simple test case that just does send/receives, fails with OpenMPI 2 or 3. We have observed our code failing when built with OpenMPI 2.1 or 3.x on Fre= eBSD and also on one other Linux platform, and have tracked it down at least on = BSD to a simple test case, in which it is observed that data sent via MPI_Send calls to the same processor that it's running on are not received by a corresponding MPI_Irecv, a use case that is supposed to be standard complia= nt *AND* which DOES work with OpenMPI 1.10 on the same machine. My uname -a: FreeBSD yyy.zzz 10.4-STABLE FreeBSD 10.4-STABLE #0 r327510: Tue Jan 2 21:5= 2:13 MST 2018 xxx@yyy.zzz:/usr/obj/usr/src/sys/GENERIC amd64 The attached test program will print BAD on each line where it is supposed = to report that proc#N received something from proc#M when M=3D=3DN, if compile= d with OpenMPI 2.1.x or 3.x. It will pass just fine with OpenMPI 1.10. We have run this on a few OSen other than BSD including RHEL6, RHEL7, and O= S X, and none have the same issue. It does appear, however, that Ubuntu 18.04's OpenMPI 2.1.x has the same problem. It is not at all clear where this problem lies, except that the symptom is = that the receive requests do not in fact receive any data if the sender is the s= ame processor. To reproduce: /usr/local/mpi/openmpi2/bin/mpicc -o testBUG967 testBUG967.c /usr/local/mpi/openmpi2/bin/mpirun -np 2 ./testBUG967 On my machine, this gives the output: 0 posting receive 0 0x803fc78b0 0 posting receive 1 0x803fc78b4 0 sending to 0 value 1000 1 posting receive 0 0x803fc78b0 1 posting receive 1 0x803fc78b4 1 sending to 0 value 2000 1 sending to 1 value 2001 0 sending to 1 value 1001 0 wait source 0 count 0=20 0 wait source 1 count 4=20 0 procs_from 0 vals_from -1000 BAD BAD BAD=20 0 procs_from 1 vals_from 2000=20=20=20 1 wait source 1 count 0=20 1 wait source 0 count 4=20 1 procs_from 1 vals_from -1000 BAD BAD BAD=20 1 procs_from 0 vals_from 1001=20=20=20 When run instead with openmpi 1 it gives the output actually expected: > /usr/local/mpi/openmpi/bin/mpicc -o testBUG967 testBUG967.c=20 > /usr/local/mpi/openmpi/bin/mpirun -np 2 ./testBUG967=20 1 posting receive 0 0x803e23ad8 1 posting receive 1 0x803e23adc 1 sending to 0 value 2000 0 posting receive 0 0x803e23ad8 0 posting receive 1 0x803e23adc 0 sending to 0 value 1000 1 sending to 1 value 2001 0 sending to 1 value 1001 1 wait source 1 count 4=20 1 wait source 0 count 4=20 1 procs_from 1 vals_from 2001=20=20=20 0 wait source 0 count 4=20 0 wait source 1 count 4=20 0 procs_from 0 vals_from 1000=20=20=20 1 procs_from 0 vals_from 1001=20=20=20 0 procs_from 1 vals_from 2000=20=20=20 I have tried it with varying --mca btl options (tcp,self; sm,self; vader,se= lf) as well, and it always gets the failed receive issue with all of them unles= s I use OpenMPI 1.x. Additional information: > pkg info openmpi2 openmpi2-2.1.5 Name : openmpi2 Version : 2.1.5 Installed on : Mon Sep 24 15:31:19 2018 MDT Origin : net/openmpi2 Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : net parallel Licenses : BSD3CLAUSE Maintainer : danilo@FreeBSD.org WWW : http://www.open-mpi.org/ Comment : High Performance Message Passing Library Options : DEBUG : on IPV6 : on SLURM : off TORQUE : off Shared Libs required: libhwloc.so.5 libevent-2.1.so.6 libevent_pthreads-2.1.so.6 libquadmath.so.0 libgcc_s.so.1 libgfortran.so.4 libmunge.so.2 > pkg info openmpi openmpi-1.10.7_3 Name : openmpi Version : 1.10.7_3 Installed on : Wed Aug 22 23:44:37 2018 MDT Origin : net/openmpi Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : net parallel Licenses : BSD3CLAUSE Maintainer : danilo@FreeBSD.org WWW : http://www.open-mpi.org/ Comment : High Performance Message Passing Library Options : IPV6 : on SLURM : off TORQUE : off VT : off Shared Libs required: libquadmath.so.0 libevent_pthreads-2.1.so.6 libevent-2.1.so.6 libhwloc.so.5 libgfortran.so.4 libgcc_s.so.1 > pkg info hwloc hwloc-1.11.11 Name : hwloc Version : 1.11.11 Installed on : Wed Sep 19 08:08:13 2018 MDT Origin : devel/hwloc Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : devel Licenses : BSD3CLAUSE Maintainer : phd_kimberlite@yahoo.co.jp WWW : http://www.open-mpi.org/projects/hwloc/ Comment : Portable Hardware Locality software package Options : CAIRO : off DOCS : on Shared Libs required: libxml2.so.2 libpciaccess.so.0 --=20 You are receiving this mail because: You are the assignee for the bug.=