Date: Wed, 03 Mar 2021 18:18:46 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 253992] ufsdirhash_build() sleeps with directory vnode locked due to M_WAITOK Message-ID: <bug-253992-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253992 Bug ID: 253992 Summary: ufsdirhash_build() sleeps with directory vnode locked due to M_WAITOK Product: Base System Version: 12.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: dgmorris@earthlink.net On a system where a timeout panic has been added for attempts to acquire a UFS root vnode lock, such a panic was observed. The holder of the lock was attempting to do: ched_switch() at sched_switch+0xbcc/frame 0xfffffe885e4e5e10 mi_switch() at mi_switch+0x29b/frame 0xfffffe885e4e5e60 sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe885e4e5e90 _sleep() at _sleep+0x265/frame 0xfffffe885e4e5f10 vm_wait_doms() at vm_wait_doms+0xfd/frame 0xfffffe885e4e5f40 vm_wait_domain() at vm_wait_domain+0x50/frame 0xfffffe885e4e5f70 vm_domain_alloc_fail() at vm_domain_alloc_fail+0x91/frame 0xfffffe885e4e5fb0 vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x243/frame 0xfffffe885e4e6030 uma_small_alloc() at uma_small_alloc+0x9b/frame 0xfffffe885e4e6080 keg_alloc_slab() at keg_alloc_slab+0x103/frame 0xfffffe885e4e60d0 zone_import() at zone_import+0x106/frame 0xfffffe885e4e6170 zone_alloc_item() at zone_alloc_item+0x6e/frame 0xfffffe885e4e61b0 cache_alloc_retry() at cache_alloc_retry+0x13c/frame 0xfffffe885e4e6210 uma_zalloc_arg() at uma_zalloc_arg+0x114/frame 0xfffffe885e4e6270 ufsdirhash_build() at ufsdirhash_build+0x850/frame 0xfffffe885e4e6310 ufs_lookup_ino() at ufs_lookup_ino+0x1a2/frame 0xfffffe885e4e6420 VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x8a/frame 0xfffffe885e4e6450 vfs_cache_lookup() at vfs_cache_lookup+0xd6/frame 0xfffffe885e4e64b0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x8a/frame 0xfffffe885e4e64e0 lookup() at lookup+0x756/frame 0xfffffe885e4e6590 namei() at namei+0x46c/frame 0xfffffe885e4e6660 _vn_open_cred() at _vn_open_cred+0x1e1/frame 0xfffffe885e4e67f0 _vn_open() at _vn_open+0x1f/frame 0xfffffe885e4e6820 isi_kern_openat() at isi_kern_openat+0x3b4/frame 0xfffffe885e4e6a10 sys_openat() at sys_openat+0x5a/frame 0xfffffe885e4e6a70 amd64_syscall() at amd64_syscall+0x381/frame 0xfffffe885e4e6bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe885e4e6bf0 — syscall (499, FreeBSD ELF64, sys_openat), rip = 0x8005b0d3a, rsp = 0x7fffffffe368, rbp = 0x7fffffffe3e0 — The system is ccNUMA, with 2 domains. One domain is OOM, the other is not: db> show pageq pq_free 1887153 dom 0 page_cnt 8102741 free 2 pq_act 57 pq_inact 1 pq_laund 0 pq_unsw 311370 dom 1 page_cnt 8133281 free 1887151 pq_act 17391 pq_inact 4868531 pq_laund 0 pq_unsw 186545 The DIRHASH uma zone is created with flags argument of 0, so acquires the default NUMA policy (FIRSTTOUCH). The problem allocation is on line 437: dh->dh_hash = malloc(narrays * sizeof(dh->dh_hash[0]), M_DIRHASH, M_NOWAIT | M_ZERO); if (dh->dh_hash == NULL) goto fail; dh->dh_blkfree = malloc(nblocks * sizeof(dh->dh_blkfree[0]), M_DIRHASH, M_NOWAIT); if (dh->dh_blkfree == NULL) goto fail; for (i = 0; i < narrays; i++) { if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_WAITOK()) == NULL) goto fail; for (j = 0; j < DH_NBLKOFF; j++) dh->dh_hash[i][j] = DIRHASH_EMPTY; } Where: #define DIRHASH_BLKALLOC_WAITOK() uma_zalloc(ufsdirhash_zone, M_WAITOK) from line 73. As can be seen, the code is even assuming the allocation can fail as if it were NOWAIT, but is instead doing M_WAITOK while holding the directory vnode lock per sys/ufs/ufs/ufs_lookup.c line 254: #ifdef DEBUG_VFS_LOCKS /* * Assert that the directory vnode is locked, and locked * exclusively for the last component lookup for modifying * operations. * * The directory-modifying operations need to save * intermediate state in the inode between namei() call and * actual directory manipulations. See fields in the struct * inode marked as 'used during directory lookup'. We must * ensure that upgrade in namei() does not happen, since * upgrade might need to unlock vdp. If quotas are enabled, * getinoquota() also requires exclusive lock to modify inode. */ ASSERT_VOP_LOCKED(vdp, "ufs_lookup1"); if ((nameiop == CREATE || nameiop == DELETE || nameiop == RENAME) && (flags & (LOCKPARENT | ISLASTCN)) == (LOCKPARENT | ISLASTCN)) ASSERT_VOP_ELOCKED(vdp, "ufs_lookup2"); #endif restart: bp = NULL; slotoffset = -1; /* * We now have a segment name to search for, and a directory to search. * * Suppress search for slots unless creating * file and at end of pathname, in which case * we watch for a place to put the new file in * case it doesn't already exist. */ ino = 0; i_diroff = dp->i_diroff; slotstatus = FOUND; slotfreespace = slotsize = slotneeded = 0; if ((nameiop == CREATE || nameiop == RENAME) && (flags & ISLASTCN)) { slotstatus = NONE; slotneeded = DIRECTSIZ(cnp->cn_namelen); } #ifdef UFS_DIRHASH /* * Use dirhash for fast operations on large directories. The logic * to determine whether to hash the directory is contained within * ufsdirhash_build(); a zero return means that it decided to hash * this directory and it successfully built up the hash table. */ if (ufsdirhash_build(dp) == 0) { The NUMA system may be more likely to hit this in the case of imbalanced load (as the system as a whole may not be OOM, just the domain), but the risk is there on any system. I propose the following diff as the WAITOK allocation is only used in one place: dmorris@wkstn-dmorris2:freebsd_ufs $ git diff -p origin/main diff --git a/sys/ufs/ufs/ufs_dirhash.c b/sys/ufs/ufs/ufs_dirhash.c index 13094b5a1c97..72f38be5eea4 100644 --- a/sys/ufs/ufs/ufs_dirhash.c +++ b/sys/ufs/ufs/ufs_dirhash.c @@ -108,7 +108,7 @@ static uma_zone_t ufsdirhash_zone; #define DIRHASHLIST_LOCK() mtx_lock(&ufsdirhash_mtx) #define DIRHASHLIST_UNLOCK() mtx_unlock(&ufsdirhash_mtx) -#define DIRHASH_BLKALLOC_WAITOK() uma_zalloc(ufsdirhash_zone, M_WAITOK) +#define DIRHASH_BLKALLOC_NOWAIT() uma_zalloc(ufsdirhash_zone, M_NOWAIT) #define DIRHASH_BLKFREE(ptr) uma_zfree(ufsdirhash_zone, (ptr)) #define DIRHASH_ASSERT_LOCKED(dh) \ sx_assert(&(dh)->dh_lock, SA_LOCKED) @@ -425,7 +425,7 @@ ufsdirhash_build(struct inode *ip) if (dh->dh_blkfree == NULL) goto fail; for (i = 0; i < narrays; i++) { - if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_WAITOK()) == NULL) + if ((dh->dh_hash[i] = DIRHASH_BLKALLOC_NOWAIT()) == NULL) goto fail; for (j = 0; j < DH_NBLKOFF; j++) dh->dh_hash[i][j] = DIRHASH_EMPTY; -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-253992-227>
