Date: Thu, 16 May 2024 06:59:03 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 279021] Random phantom files by g_new_bio() failure Message-ID: <bug-279021-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279021 Bug ID: 279021 Summary: Random phantom files by g_new_bio() failure Product: Base System Version: 14.0-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: seigo.tanimura@gmail.com A bug in g_new_bio() is suspected to cause the random phantom files often silently; expoited during the poudriere-bulk(8) test on bug #275594, comment #147. * Test Environment: Hypervisor - CPU: Intel Core i7-13700KF 3.4GHz (24 threads) - RAM: 128 GB - OS: Windows 10 - Storage: NVMe and SATA HDDs - Hypervisor: VMWare Workstation 17.5 * Test Environment: VM & OS - vCPUs: 16 - RAM: 16 GB - Swap: 128 GB on NVMe - OS: FreeBSD 14.1-BETA2 - All of the releng/14.1 fixes in bug #275594, comment #147 applied. - Storage & Filesystems: ZFS mainly - Main pool: 1.5G on SATA HDD - ZIL: 16 GB on NVMe - L2ARC: 64 GB on NVMe * Application - poudriere - Number of ports to build: 2325 (including dependencies) - Major configurations for port building - poudriere.conf - #NO_ZFS=3Dyes (ZFS enabled) - USE_PORTLINT=3Dno - USE_TMPFS=3D"wrkdir data localbase" - TMPFS_LIMIT=3D32 - DISTFILES_CACHE=3D(configured in ZFS) - CCACHE_DIR=3D(configured in ZFS) - The cache is cleared in advance. - CCACHE_STATIC_PREFIX=3D/usr/local - PARALLEL_JOBS=3D16 (actually givin via "poudriere bulk -J") - make.conf - MAKE_JOBS_NUMBER=3D4 * Steps 1. Remove the package output directory, so that all packages are built. 2. Clear the ccache contents by "ccache -C". 3. Run 'poudriere bulk' to start the parallel build. 4. Observe the system and build progress by top(1), poudriere web UI, cmdwatch(1) + sysctl(8), etc. * Expected results - All of the ports are built successfully. * Observed behaviors during building - In about 2 hours, the RAM went out and the kernel started swapping out the pages. - The bulk port build failed at random. + A header file or a library provided via the dependency was often missin= g. - The kernel occasionally logged "swap_pager: cannot allocate bio". - vm.uma.g_bio.stats.fails increased up to ~5000. * Analysis g_new_bio(), the kernel function that allocates a new bio in the non-blocki= ng manner, returns NULL if the g_bio uma(9) zone has no free items. While such the case is regarded as a rare error with an ordinary HDD, an nvme(4) stora= ge is likely to trigger that issue because of its high capacity for the parall= el I/O operations. Although not confirmed precisely, the effect of this issue seems to include= the phantom files, ie the files created newly do not become visible immediately= .=20 Under poudriere-bulk(8), it is suspected that the files installed during build-depends and lib-depends are not detected as expected. The problem happens at random; it is up to the state of the g_bio zone. No logs are emitted by g_new_bio() in case of an allocation failure. An exception is the swap pager, which logs "swap_pager: cannot allocate bio". = The increase of vm.uma.g_bio.stats.fails is the sole record of the errors. * Proposed Fix and Test Results Reserve some bios for the non-blocking allocation. Uma(9) supports the item reservation, which can be used to implement the fix. NB the item reservati= on of uma(9) can be configured at the boot time only, in practice. The proposed fix has been committed to the submitter's GitHub repository and made public. New Loader Tunable: - kern.geom.reserved_new_bios The number of the bios reserved for the non-blocking allocation. (Defaul= t: 65536) Zero means no bios are reserved. Due to the limitation on the uma(9) zon= e, this configuration cannot be altered upon a running host. All of the sources are under https://github.com/altimeter-130ft/freebsd-freebsd-src. | | Git Commit Hash Base Branch | Fix Branch | Base | Fix =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D main | topic-bio-reservation | c1ebd76c3f | c784b64= b8a ------------+-----------------------------------+-----------------+--------= ---- stable/14 | stable/14-topic-bio-reservation | 3c414a8c2f | aeaac96= a7a ------------+-----------------------------------+-----------------+--------= ---- releng/14.1 | releng/14.1-topic-bio-reservation | e3e57ae30c | 8f0281d= 20d ------------+-----------------------------------+-----------------+--------= ---- releng/14.0 | releng/14.0-topic-bio-reservation | d338712beb | 6f8fed5= 2ee ------------+-----------------------------------+-----------------+--------= ---- stable/13 | stable/13-topic-bio-reservation | 85e63d952d | 64b9962= cec ------------+-----------------------------------+-----------------+--------= ---- releng/13.3 | releng/13.3-topic-bio-reservation | be4f1894ef | 4d233d7= 419 ------------+-----------------------------------+-----------------+--------= ---- releng/13.2 | releng/13.2-topic-bio-reservation | f5ac4e174f | 7b156cb= ac8 Poudriere-bulk(8) has been tested with the releng/14.1-topic-bio-reservation branch (and the ZFS fix on bug #275594, comment #147), with the following results proving the fix: - vm.uma.g_bio.stats.fails did not increase at all. - "swap_pager: cannot allocate bio" did not appear in the log at all. - The build error disappeared completely. + Only one port (graphics/gimp-app) failed, but due to a separate problem. (An internal error of clang.) --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-279021-227>