Date: Mon, 24 Sep 2018 21:41:27 +0000 From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 231697] net/openmpi2: MPI_Send to self fails (or receive from self fails?) Message-ID: <bug-231697-7788@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D231697 Bug ID: 231697 Summary: net/openmpi2: MPI_Send to self fails (or receive from self fails?) Product: Ports & Packages Version: Latest Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: Individual Port(s) Assignee: danilo@FreeBSD.org Reporter: russo@bogodyn.org Flags: maintainer-feedback?(danilo@FreeBSD.org) Assignee: danilo@FreeBSD.org Attachment #197466 text/plain mime type: Created attachment 197466 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D197466&action= =3Dedit Simple test case that just does send/receives, fails with OpenMPI 2 or 3. We have observed our code failing when built with OpenMPI 2.1 or 3.x on Fre= eBSD and also on one other Linux platform, and have tracked it down at least on = BSD to a simple test case, in which it is observed that data sent via MPI_Send calls to the same processor that it's running on are not received by a corresponding MPI_Irecv, a use case that is supposed to be standard complia= nt *AND* which DOES work with OpenMPI 1.10 on the same machine. My uname -a: FreeBSD yyy.zzz 10.4-STABLE FreeBSD 10.4-STABLE #0 r327510: Tue Jan 2 21:5= 2:13 MST 2018 xxx@yyy.zzz:/usr/obj/usr/src/sys/GENERIC amd64 The attached test program will print BAD on each line where it is supposed = to report that proc#N received something from proc#M when M=3D=3DN, if compile= d with OpenMPI 2.1.x or 3.x. It will pass just fine with OpenMPI 1.10. We have run this on a few OSen other than BSD including RHEL6, RHEL7, and O= S X, and none have the same issue. It does appear, however, that Ubuntu 18.04's OpenMPI 2.1.x has the same problem. It is not at all clear where this problem lies, except that the symptom is = that the receive requests do not in fact receive any data if the sender is the s= ame processor. To reproduce: /usr/local/mpi/openmpi2/bin/mpicc -o testBUG967 testBUG967.c /usr/local/mpi/openmpi2/bin/mpirun -np 2 ./testBUG967 On my machine, this gives the output: 0 posting receive 0 0x803fc78b0 0 posting receive 1 0x803fc78b4 0 sending to 0 value 1000 1 posting receive 0 0x803fc78b0 1 posting receive 1 0x803fc78b4 1 sending to 0 value 2000 1 sending to 1 value 2001 0 sending to 1 value 1001 0 wait source 0 count 0=20 0 wait source 1 count 4=20 0 procs_from 0 vals_from -1000 BAD BAD BAD=20 0 procs_from 1 vals_from 2000=20=20=20 1 wait source 1 count 0=20 1 wait source 0 count 4=20 1 procs_from 1 vals_from -1000 BAD BAD BAD=20 1 procs_from 0 vals_from 1001=20=20=20 When run instead with openmpi 1 it gives the output actually expected: > /usr/local/mpi/openmpi/bin/mpicc -o testBUG967 testBUG967.c=20 > /usr/local/mpi/openmpi/bin/mpirun -np 2 ./testBUG967=20 1 posting receive 0 0x803e23ad8 1 posting receive 1 0x803e23adc 1 sending to 0 value 2000 0 posting receive 0 0x803e23ad8 0 posting receive 1 0x803e23adc 0 sending to 0 value 1000 1 sending to 1 value 2001 0 sending to 1 value 1001 1 wait source 1 count 4=20 1 wait source 0 count 4=20 1 procs_from 1 vals_from 2001=20=20=20 0 wait source 0 count 4=20 0 wait source 1 count 4=20 0 procs_from 0 vals_from 1000=20=20=20 1 procs_from 0 vals_from 1001=20=20=20 0 procs_from 1 vals_from 2000=20=20=20 I have tried it with varying --mca btl options (tcp,self; sm,self; vader,se= lf) as well, and it always gets the failed receive issue with all of them unles= s I use OpenMPI 1.x. Additional information: > pkg info openmpi2 openmpi2-2.1.5 Name : openmpi2 Version : 2.1.5 Installed on : Mon Sep 24 15:31:19 2018 MDT Origin : net/openmpi2 Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : net parallel Licenses : BSD3CLAUSE Maintainer : danilo@FreeBSD.org WWW : http://www.open-mpi.org/ Comment : High Performance Message Passing Library Options : DEBUG : on IPV6 : on SLURM : off TORQUE : off Shared Libs required: libhwloc.so.5 libevent-2.1.so.6 libevent_pthreads-2.1.so.6 libquadmath.so.0 libgcc_s.so.1 libgfortran.so.4 libmunge.so.2 > pkg info openmpi openmpi-1.10.7_3 Name : openmpi Version : 1.10.7_3 Installed on : Wed Aug 22 23:44:37 2018 MDT Origin : net/openmpi Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : net parallel Licenses : BSD3CLAUSE Maintainer : danilo@FreeBSD.org WWW : http://www.open-mpi.org/ Comment : High Performance Message Passing Library Options : IPV6 : on SLURM : off TORQUE : off VT : off Shared Libs required: libquadmath.so.0 libevent_pthreads-2.1.so.6 libevent-2.1.so.6 libhwloc.so.5 libgfortran.so.4 libgcc_s.so.1 > pkg info hwloc hwloc-1.11.11 Name : hwloc Version : 1.11.11 Installed on : Wed Sep 19 08:08:13 2018 MDT Origin : devel/hwloc Architecture : FreeBSD:10:amd64 Prefix : /usr/local Categories : devel Licenses : BSD3CLAUSE Maintainer : phd_kimberlite@yahoo.co.jp WWW : http://www.open-mpi.org/projects/hwloc/ Comment : Portable Hardware Locality software package Options : CAIRO : off DOCS : on Shared Libs required: libxml2.so.2 libpciaccess.so.0 --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-231697-7788>