From owner-dev-commits-ports-branches@freebsd.org Mon Sep 6 22:26:16 2021 Return-Path: Delivered-To: dev-commits-ports-branches@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1445B67C0C8; Mon, 6 Sep 2021 22:26:16 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4H3NKN06Qcz3p9b; Mon, 6 Sep 2021 22:26:16 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D943226D47; Mon, 6 Sep 2021 22:26:15 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 186MQF66089523; Mon, 6 Sep 2021 22:26:15 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 186MQFkn089522; Mon, 6 Sep 2021 22:26:15 GMT (envelope-from git) Date: Mon, 6 Sep 2021 22:26:15 GMT Message-Id: <202109062226.186MQFkn089522@gitrepo.freebsd.org> To: ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-branches@FreeBSD.org From: Jan Beich Subject: git: fd490a171c3d - 2021Q3 - net/mpich: unbreak optimized runtime after 88e134883dd2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: jbeich X-Git-Repository: ports X-Git-Refname: refs/heads/2021Q3 X-Git-Reftype: branch X-Git-Commit: fd490a171c3da0d7bcb9a5f3ee3b4b46075dfa9e Auto-Submitted: auto-generated X-BeenThere: dev-commits-ports-branches@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commits to the quarterly branches of the FreeBSD ports repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2021 22:26:16 -0000 The branch 2021Q3 has been updated by jbeich: URL: https://cgit.FreeBSD.org/ports/commit/?id=fd490a171c3da0d7bcb9a5f3ee3b4b46075dfa9e commit fd490a171c3da0d7bcb9a5f3ee3b4b46075dfa9e Author: Henrik Gulbrandsen AuthorDate: 2021-08-12 14:35:20 +0000 Commit: Jan Beich CommitDate: 2021-09-06 22:25:57 +0000 net/mpich: unbreak optimized runtime after 88e134883dd2 Runtime may fail without a L0 driver like intel-compute-runtime e.g., $ mpivars Abort(268484367) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(153): gpu_init failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=268484367 : system msg for write_line failure : Bad file descriptor Attempting to use an MPI routine before initializing MPICH $ MPIR_CVAR_ENABLE_GPU=0 mpivars Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Segmentation fault PR: 256244 (for tracking) (cherry picked from commit b5815e7648a8e5307a20a234befa00e34306319d) --- net/mpich/Makefile | 2 +- net/mpich/files/patch-l0-fallback | 44 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/net/mpich/Makefile b/net/mpich/Makefile index 9741b1b75d7f..295897406b27 100644 --- a/net/mpich/Makefile +++ b/net/mpich/Makefile @@ -1,6 +1,6 @@ PORTNAME= mpich PORTVERSION= 3.4.2 -PORTREVISION= 2 +PORTREVISION= 3 CATEGORIES= net parallel MASTER_SITES= https://www.mpich.org/static/downloads/${DISTVERSION}/ diff --git a/net/mpich/files/patch-l0-fallback b/net/mpich/files/patch-l0-fallback new file mode 100644 index 000000000000..35f18dc272a5 --- /dev/null +++ b/net/mpich/files/patch-l0-fallback @@ -0,0 +1,44 @@ +$ pkg delete intel-compute-runtime +$ mpivars +PCI: Failed to initialize libpciaccess with pci_system_init(): 6 (Permission denied) +Abort(268484367) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: +MPIR_Init_thread(153): gpu_init failed +[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=268484367 +: +system msg for write_line failure : Bad file descriptor +Attempting to use an MPI routine before initializing MPICH + +--- src/mpi/init/initthread.c.orig 2021-05-25 17:37:05 UTC ++++ src/mpi/init/initthread.c +@@ -150,7 +150,9 @@ int MPIR_Init_thread(int *argc, char ***argv, int user + * inside MPID_Init */ + if (MPIR_CVAR_ENABLE_GPU) { + int mpl_errno = MPL_gpu_init(); +- MPIR_ERR_CHKANDJUMP(mpl_errno != MPL_SUCCESS, mpi_errno, MPI_ERR_OTHER, "**gpu_init"); ++ MPIR_ERR_CHKANDJUMP( ++ mpl_errno != MPL_SUCCESS && mpl_errno != MPL_ERR_GPU_INTERNAL, ++ mpi_errno, MPI_ERR_OTHER, "**gpu_init"); + } + + MPL_atomic_store_int(&MPIR_Process.mpich_state, MPICH_MPI_STATE__IN_INIT); +--- src/mpid/ch4/netmod/ofi/ofi_init.c.orig 2021-05-25 17:37:05 UTC ++++ src/mpid/ch4/netmod/ofi/ofi_init.c +@@ -731,7 +731,6 @@ int MPIDI_OFI_mpi_init_hook(int rank, int size, int ap + MPL_gpu_malloc_host(&(MPIDI_OFI_global.am_bufs[i]), MPIDI_OFI_AM_BUFF_SZ); + MPIDI_OFI_global.am_reqs[i].event_id = MPIDI_OFI_EVENT_AM_RECV; + MPIDI_OFI_global.am_reqs[i].index = i; +- MPIR_Assert(MPIDI_OFI_global.am_bufs[i]); + MPIDI_OFI_global.am_iov[i].iov_base = MPIDI_OFI_global.am_bufs[i]; + MPIDI_OFI_global.am_iov[i].iov_len = MPIDI_OFI_AM_BUFF_SZ; + MPIDI_OFI_global.am_msg[i].msg_iov = &MPIDI_OFI_global.am_iov[i]; +--- src/mpl/src/gpu/mpl_gpu_ze.c.orig 2021-05-25 17:37:05 UTC ++++ src/mpl/src/gpu/mpl_gpu_ze.c +@@ -33,7 +33,7 @@ int MPL_gpu_get_dev_count(int *dev_cnt, int *dev_id) + { + int ret = MPL_SUCCESS; + if (!gpu_initialized) { +- ret = MPL_gpu_init(); ++ MPL_gpu_init(); + } + + *dev_cnt = device_count;