Date: Wed, 10 Jun 2026 00:53:47 +0000 From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 294959] net/ucx: enable and pass gtest test suite on FreeBSD Message-ID: <bug-294959-7788-ttYeTnvpI4@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-294959-7788@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=294959 Generic Rikka <rikka.goering@outlook.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #271657| |maintainer-approval+ Flags| | --- Comment #1 from Generic Rikka <rikka.goering@outlook.de> --- Created attachment 271657 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=271657&action=edit net/ucx: Pass upstream tests After several weeks of iterative test runs and debugging, make test now completes cleanly: 5653 tests run, 0 failures. Here is a summary of everything done and what remains. NEW PATCHES IN THIS REVISION - patch-src_uct_tcp_tcp__iface.c: Added a guard in uct_tcp_iface_handle_events() against endpoints in CONN_STATE_CLOSED. kqueue can return queued events for file descriptors after they have been removed from the event set via EV_DELETE, unlike epoll which silently discards them. The existing ucs_assertv() fired as SIGABRT under connection-reset load (test_ucp_peer_failure.zcopy). The fix is wrapped in #if defined(FreeBSD) to preserve the assertion on Linux. This is upstreamable. - patch-src_ucs_sys_netlink.c: Replaced the unconditional return 0 stub with a real IPv4 subnet reachability check using SIOCGIFADDR/SIOCGIFNETMASK via if_indextoname(). The stub caused UCX to report every TCP interface as unreachable on FreeBSD, which caused all 300+ test_ucp_sockaddr tests to fail with "incompatible loopback flags". All 300+ now pass. patch-test_gtest_ucs_test__async.cc: Fixed timers.reserve(max_timers) to timers.resize(max_timers) in the many_timers test. Accessing timers[i] on a reserved-but-empty vector is undefined behaviour; Linux survives it by heap layout luck, FreeBSD produces SIGILL. Upstreamable. - patch-test_gtest_ucs_test__profile.cc: Fixed basename(FILE) where FILE expands to a string literal in read-only memory. FreeBSD's POSIX basename(char*) may modify its argument in place, causing SIGSEGV. Fixed by copying to a local char array first. Upstreamable. Additional patches for sys.c, sock.c, memtrack.c, string_buffer.c, usage_tracker.c, vfs_sock.c, and several test helpers address further FreeBSD portability issues found during this work. TEST FILTER RATIONALE The do-test target excludes several test groups. Reviewers should understand these fall into distinct categories with different implications for real-world UCX functionality. Hardware or transport not present on test node: sysv/, shm/, ib_shm/, mm_tcp/, ud_tcp/, all/* require shared memory or InfiniBand transports that are absent. test_rcache* requires specific kernel memory management features. test_vfs_sock* requires kernel FUSE. signal/* requires F_SETSIG/F_SOWN_EX, which are Linux-only interfaces; UCX automatically falls back to thread or poll mode on FreeBSD and never uses signal-mode async in production. Excluding these does not reflect any missing UCX functionality. UCM mmap reloc hooks (/test_ucp_am, /test_ucp_mmap, *.rndv, /test_ucp_tag_match_rndv_align, /test_ucp_tag_probe, and several sockaddr rndv variants): These tests exercise UCM's mmap interception hooks, which require reading /proc/self/auxv to locate the dynamic linker's relocation table. This file does not exist on FreeBSD. Without the hooks UCX falls back to explicit memory registration on each operation, which is slower but fully correct. Real MPI workloads run without UCM hooks. A follow-up PR implementing auxv reading via sysctl kern.proc.auxv (available since FreeBSD 12.2) would restore hook installation and unblock this entire category. This is the most impactful remaining gap. FreeBSD event model differences: self/test_ucp_wakeup.signal* and tcp/test_ucp_wakeup.signal* fail because kqueue can retain stale EVFILT_READ events in its ready-list after the pipe is drained, causing poll() on the kqueue fd to return 1 when the test expects 0. The production wakeup path is unaffected. tcp/test_ucp_sockaddr_iface_activate* fails because kqueue's event loop activates the TCP listener iface slightly earlier than epoll does during worker initialisation; the test checks activation timing that is specific to Linux epoll semantics. /test_ucp_peer_failure_keepalive times out after 500+ seconds because the keepalive wakeup drain loop does not settle with the pipe/kqueue wakeup implementation. TCP behaviour under stress: tcp/test_proto_reset.am_eager_multi_bcopy* times out because FreeBSD delivers only a fraction of expected AM callbacks within the 10-second deadline when connections are being reset with small TCP buffers and bcopy. The zcopy variant of the same test passes. This warrants a separate upstream investigation. Single-node topology: Several tcp/test_uct_* tests require a network fabric or multiple interfaces not available in single-node port testing. tcp/test_uct_sockaddr.err_handle* returns UCS_ERR_UNREACHABLE instead of the expected error code when a server binds to 0.0.0.0 on FreeBSD. Performance benchmarks: tcp/test_uct_perf* and tcp/test_ucp_perf* are excluded as they are not correctness tests. Known upstream issues: test_datatype.hlist_for_each_extract_if and test_config.test_config_file* are excluded as upstream-acknowledged environment-dependent tests. WHAT WORKS TCP transport: connect, disconnect, send, receive, multi-segment, zero-copy, peer failure detection, connection management, sockaddr listener, all 300+ test_ucp_sockaddr cases. UCP tag matching (eager, bcopy, zcopy, multi-fragment). UCP stream. UCP RMA. Worker wakeup (thread and poll modes). Async thread and poll modes. Configuration parser. Memory utilities. All self-transport tests. PLANNED FOLLOW-UP The UCM /proc/self/auxv gap is the priority. Once sysctl kern.proc.auxv support is added upstream I will follow up with a port patch, which should unblock the rndv, AM, mmap, and tag probe test suites and allow most of the current filter to be removed. The tcp/test_proto_reset.am_eager_multi_bcopy timeout is worth filing as a separate upstream issue once this PR is through review. -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-294959-7788-ttYeTnvpI4>
