Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jun 2026 00:53:47 +0000
From:      bugzilla-noreply@freebsd.org
To:        ports-bugs@FreeBSD.org
Subject:   [Bug 294959] net/ucx: enable and pass gtest test suite on FreeBSD
Message-ID:  <bug-294959-7788-ttYeTnvpI4@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-294959-7788@https.bugs.freebsd.org/bugzilla/>

index | next in thread | previous in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=294959

Generic Rikka <rikka.goering@outlook.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #271657|                            |maintainer-approval+
              Flags|                            |

--- Comment #1 from Generic Rikka <rikka.goering@outlook.de> ---
Created attachment 271657
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=271657&action=edit
net/ucx: Pass upstream tests

After several weeks of iterative test runs and debugging, make test now
completes cleanly: 5653 tests run, 0 failures. 
Here is a summary of everything done and what remains.


NEW PATCHES IN THIS REVISION

 - patch-src_uct_tcp_tcp__iface.c: Added a guard in
uct_tcp_iface_handle_events() against endpoints in CONN_STATE_CLOSED. kqueue
can return queued events for file descriptors after they have been removed from
the event set via EV_DELETE, unlike epoll which silently discards them. The
existing ucs_assertv() fired as SIGABRT under connection-reset load
(test_ucp_peer_failure.zcopy). The fix is wrapped in #if defined(FreeBSD) to
preserve the assertion on Linux. This is upstreamable.

 - patch-src_ucs_sys_netlink.c: Replaced the unconditional return 0 stub with a
real IPv4 subnet reachability check using SIOCGIFADDR/SIOCGIFNETMASK via
if_indextoname(). The stub caused UCX to report every TCP interface as
unreachable on FreeBSD, which caused all 300+ test_ucp_sockaddr tests to fail
with "incompatible loopback flags". All 300+ now pass.
patch-test_gtest_ucs_test__async.cc: Fixed timers.reserve(max_timers) to
timers.resize(max_timers) in the many_timers test. Accessing timers[i] on a
reserved-but-empty vector is undefined behaviour; Linux survives it by heap
layout luck, FreeBSD produces SIGILL. Upstreamable.

 - patch-test_gtest_ucs_test__profile.cc: Fixed basename(FILE) where FILE
expands to a string literal in read-only memory. FreeBSD's POSIX
basename(char*) may modify its argument in place, causing SIGSEGV. Fixed by
copying to a local char array first. Upstreamable.

Additional patches for sys.c, sock.c, memtrack.c, string_buffer.c,
usage_tracker.c, vfs_sock.c, and several test helpers address further FreeBSD
portability issues found during this work.


TEST FILTER RATIONALE

The do-test target excludes several test groups. Reviewers should understand
these fall into distinct categories with different implications for real-world
UCX functionality.
Hardware or transport not present on test node: sysv/, shm/, ib_shm/, mm_tcp/,
ud_tcp/, all/* require shared memory or InfiniBand transports that are absent.
test_rcache* requires specific kernel memory management features.
test_vfs_sock* requires kernel FUSE. signal/* requires F_SETSIG/F_SOWN_EX,
which are Linux-only interfaces; UCX automatically falls back to thread or poll
mode on FreeBSD and never uses signal-mode async in production. Excluding these
does not reflect any missing UCX functionality.
UCM mmap reloc hooks (/test_ucp_am, /test_ucp_mmap, *.rndv,
/test_ucp_tag_match_rndv_align, /test_ucp_tag_probe, and several sockaddr rndv
variants): These tests exercise UCM's mmap interception hooks, which require
reading /proc/self/auxv to locate the dynamic linker's relocation table. This
file does not exist on FreeBSD. Without the hooks UCX falls back to explicit
memory registration on each operation, which is slower but fully correct. Real
MPI workloads run without UCM hooks. A follow-up PR implementing auxv reading
via sysctl kern.proc.auxv (available since FreeBSD 12.2) would restore hook
installation and unblock this entire category. This is the most impactful
remaining gap.
FreeBSD event model differences: self/test_ucp_wakeup.signal* and
tcp/test_ucp_wakeup.signal* fail because kqueue can retain stale EVFILT_READ
events in its ready-list after the pipe is drained, causing poll() on the
kqueue fd to return 1 when the test expects 0. The production wakeup path is
unaffected. tcp/test_ucp_sockaddr_iface_activate* fails because kqueue's event
loop activates the TCP listener iface slightly earlier than epoll does during
worker initialisation; the test checks activation timing that is specific to
Linux epoll semantics. /test_ucp_peer_failure_keepalive times out after 500+
seconds because the keepalive wakeup drain loop does not settle with the
pipe/kqueue wakeup implementation.
TCP behaviour under stress: tcp/test_proto_reset.am_eager_multi_bcopy* times
out because FreeBSD delivers only a fraction of expected AM callbacks within
the 10-second deadline when connections are being reset with small TCP buffers
and bcopy. The zcopy variant of the same test passes. This warrants a separate
upstream investigation.
Single-node topology: Several tcp/test_uct_* tests require a network fabric or
multiple interfaces not available in single-node port testing.
tcp/test_uct_sockaddr.err_handle* returns UCS_ERR_UNREACHABLE instead of the
expected error code when a server binds to 0.0.0.0 on FreeBSD.
Performance benchmarks: tcp/test_uct_perf* and tcp/test_ucp_perf* are excluded
as they are not correctness tests.
Known upstream issues: test_datatype.hlist_for_each_extract_if and
test_config.test_config_file* are excluded as upstream-acknowledged
environment-dependent tests.


WHAT WORKS

TCP transport: connect, disconnect, send, receive, multi-segment, zero-copy,
peer failure detection, connection management, sockaddr listener, all 300+
test_ucp_sockaddr cases. UCP tag matching (eager, bcopy, zcopy,
multi-fragment). UCP stream. UCP RMA. Worker wakeup (thread and poll modes).
Async thread and poll modes. Configuration parser. Memory utilities. All
self-transport tests.


PLANNED FOLLOW-UP

The UCM /proc/self/auxv gap is the priority. Once sysctl kern.proc.auxv support
is added upstream I will follow up with a port patch, which should unblock the
rndv, AM, mmap, and tag probe test suites and allow most of the current filter
to be removed.
The tcp/test_proto_reset.am_eager_multi_bcopy timeout is worth filing as a
separate upstream issue once this PR is through review.

-- 
You are receiving this mail because:
You are the assignee for the bug.

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-294959-7788-ttYeTnvpI4>