From nobody Thu May 16 06:59:03 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Vg1Cq2WW8z5Kk1Z for ; Thu, 16 May 2024 06:59:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Vg1Cq1JZBz4Hxx for ; Thu, 16 May 2024 06:59:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1715842743; a=rsa-sha256; cv=none; b=TkxIsFHxmeutkf03IS4/5Gs/kNw0OT02IYsRFv1eLi0ii1J6arPJpEF6EQ262lOl7uRftJ UyGz0nqBYdWlvW3I5Tdny3Q4vz3UMN+YfCVS957UnVHfmM5BBlIKAvK7x0teuTenh1SYH+ 8EztKGM3X++elD6JPplTHigy14vbnRFNlKYFbUi1K+RV3nQXR46ARNm2XLc+5GmArwPzQt leBNKUX1l/0ogSxDrDX5KTiKE8XwD7YWJrSDod05P1j/Olxi1zHKf34VmW92iYCusZsCad H2MVCaRzapQ/RlRdVP+lwncqICYE19TnuTqDL/usE0/gMRaaII5Xg3UKE+BzXQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1715842743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=qnJ1ZEnRd2+7JnmyzfnMjUmi93a6tPury7vDIY3zr6w=; b=aQNYPLkgyuSvngSazO6kC8zVEq5KquNvsRZlfE7thRWeGR3vgiJVnkpR7A4sbeajeaCCnC y9iBTOTSYITXjSjEid5eg6G3XnTCH6/qH1rTXETE14js6vNuPs+uetjXc/OFuEpzNnsnex OE6lNKCyawC+xZ7AXEJa/927VZlqjztea/nKinu2i4SM+2Um2lKT8rFn4/S3ps/BijzOZu xJEkdhrmOCHv+aYl8YmpxQrQhqUQP2Vy2FhYt9LHXsXFb4EUoGsKzcKNvHIUiM37Zyrm6V EJ5ZuiSYGty9a50bHQ1EIVvSwEWTMgPyj/iamsIWBJQ25cpurix1RQk3PlMmcg== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Vg1Cq0vz7zpQW for ; Thu, 16 May 2024 06:59:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 44G6x3Dq062141 for ; Thu, 16 May 2024 06:59:03 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 44G6x3QE062140 for bugs@FreeBSD.org; Thu, 16 May 2024 06:59:03 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 279021] Random phantom files by g_new_bio() failure Date: Thu, 16 May 2024 06:59:03 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: seigo.tanimura@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279021 Bug ID: 279021 Summary: Random phantom files by g_new_bio() failure Product: Base System Version: 14.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: seigo.tanimura@gmail.com A bug in g_new_bio() is suspected to cause the random phantom files often silently; expoited during the poudriere-bulk(8) test on bug #275594, comment #147. * Test Environment: Hypervisor - CPU: Intel Core i7-13700KF 3.4GHz (24 threads) - RAM: 128 GB - OS: Windows 10 - Storage: NVMe and SATA HDDs - Hypervisor: VMWare Workstation 17.5 * Test Environment: VM & OS - vCPUs: 16 - RAM: 16 GB - Swap: 128 GB on NVMe - OS: FreeBSD 14.1-BETA2 - All of the releng/14.1 fixes in bug #275594, comment #147 applied. - Storage & Filesystems: ZFS mainly - Main pool: 1.5G on SATA HDD - ZIL: 16 GB on NVMe - L2ARC: 64 GB on NVMe * Application - poudriere - Number of ports to build: 2325 (including dependencies) - Major configurations for port building - poudriere.conf - #NO_ZFS=3Dyes (ZFS enabled) - USE_PORTLINT=3Dno - USE_TMPFS=3D"wrkdir data localbase" - TMPFS_LIMIT=3D32 - DISTFILES_CACHE=3D(configured in ZFS) - CCACHE_DIR=3D(configured in ZFS) - The cache is cleared in advance. - CCACHE_STATIC_PREFIX=3D/usr/local - PARALLEL_JOBS=3D16 (actually givin via "poudriere bulk -J") - make.conf - MAKE_JOBS_NUMBER=3D4 * Steps 1. Remove the package output directory, so that all packages are built. 2. Clear the ccache contents by "ccache -C". 3. Run 'poudriere bulk' to start the parallel build. 4. Observe the system and build progress by top(1), poudriere web UI, cmdwatch(1) + sysctl(8), etc. * Expected results - All of the ports are built successfully. * Observed behaviors during building - In about 2 hours, the RAM went out and the kernel started swapping out the pages. - The bulk port build failed at random. + A header file or a library provided via the dependency was often missin= g. - The kernel occasionally logged "swap_pager: cannot allocate bio". - vm.uma.g_bio.stats.fails increased up to ~5000. * Analysis g_new_bio(), the kernel function that allocates a new bio in the non-blocki= ng manner, returns NULL if the g_bio uma(9) zone has no free items. While such the case is regarded as a rare error with an ordinary HDD, an nvme(4) stora= ge is likely to trigger that issue because of its high capacity for the parall= el I/O operations. Although not confirmed precisely, the effect of this issue seems to include= the phantom files, ie the files created newly do not become visible immediately= .=20 Under poudriere-bulk(8), it is suspected that the files installed during build-depends and lib-depends are not detected as expected. The problem happens at random; it is up to the state of the g_bio zone. No logs are emitted by g_new_bio() in case of an allocation failure. An exception is the swap pager, which logs "swap_pager: cannot allocate bio". = The increase of vm.uma.g_bio.stats.fails is the sole record of the errors. * Proposed Fix and Test Results Reserve some bios for the non-blocking allocation. Uma(9) supports the item reservation, which can be used to implement the fix. NB the item reservati= on of uma(9) can be configured at the boot time only, in practice. The proposed fix has been committed to the submitter's GitHub repository and made public. New Loader Tunable: - kern.geom.reserved_new_bios The number of the bios reserved for the non-blocking allocation. (Defaul= t: 65536) Zero means no bios are reserved. Due to the limitation on the uma(9) zon= e, this configuration cannot be altered upon a running host. All of the sources are under https://github.com/altimeter-130ft/freebsd-freebsd-src. | | Git Commit Hash Base Branch | Fix Branch | Base | Fix =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D main | topic-bio-reservation | c1ebd76c3f | c784b64= b8a ------------+-----------------------------------+-----------------+--------= ---- stable/14 | stable/14-topic-bio-reservation | 3c414a8c2f | aeaac96= a7a ------------+-----------------------------------+-----------------+--------= ---- releng/14.1 | releng/14.1-topic-bio-reservation | e3e57ae30c | 8f0281d= 20d ------------+-----------------------------------+-----------------+--------= ---- releng/14.0 | releng/14.0-topic-bio-reservation | d338712beb | 6f8fed5= 2ee ------------+-----------------------------------+-----------------+--------= ---- stable/13 | stable/13-topic-bio-reservation | 85e63d952d | 64b9962= cec ------------+-----------------------------------+-----------------+--------= ---- releng/13.3 | releng/13.3-topic-bio-reservation | be4f1894ef | 4d233d7= 419 ------------+-----------------------------------+-----------------+--------= ---- releng/13.2 | releng/13.2-topic-bio-reservation | f5ac4e174f | 7b156cb= ac8 Poudriere-bulk(8) has been tested with the releng/14.1-topic-bio-reservation branch (and the ZFS fix on bug #275594, comment #147), with the following results proving the fix: - vm.uma.g_bio.stats.fails did not increase at all. - "swap_pager: cannot allocate bio" did not appear in the log at all. - The build error disappeared completely. + Only one port (graphics/gimp-app) failed, but due to a separate problem. (An internal error of clang.) --=20 You are receiving this mail because: You are the assignee for the bug.=