Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Mar 2025 09:46:21 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        emulation@FreeBSD.org, stable@FreeBSD.org
Subject:   Re: panic: vrefact: wrong use count 0, linux emulation related
Message-ID:  <d3b0a784-dc4b-4b02-a158-ca70d7b3ce96@FreeBSD.org>
In-Reply-To: <41288c50-3213-4d81-913c-d8897214a9e7@FreeBSD.org>
References:  <41288c50-3213-4d81-913c-d8897214a9e7@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Turns out that it's a known and already fixed issue.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274538

Not sure why I didn't see it before and only started seeing it now.
I'll update my stable/14 to get the fix.


On 24/03/2025 5:35 pm, Andriy Gapon wrote:
> 
> Introduction.
> 
> The affected system is stable/14, amd64.
> The kernel is custom, it's configured with INVARIANTS.
> 
> The problem started to happen rather reliably after a recent upgrade of 
> packages.  I suspect that the trigger is in linux-nvidia-libs-570.124.04, but 
> the bug is in FreeBSD Linux emulation.
> 
> The reason for my suspicion is that the crash happens when starting a graphical 
> Linux application in a Linux jail.  And the crash involves a graphics-related 
> character device.
> 
> Just in case, the jail itself, including the application, hasn't been changed.
> Also, I haven't touched the base system recently.
> 
> Details.
> 
> VNASSERT failed: old > 0 not true at sys/kern/vfs_subr.c:3361 (vrefact)
> 0xfffff802945df380: type VCHR state VSTATE_CONSTRUCTED op 0xffffffff8127b648
>      usecount 1, writecount 0, refcount 39 seqc users 0 rdev 0xfffff8004565f400
>      hold count flags ()
>      flags ()
>      lock type devfs: UNLOCKED
>          dev drm/128
> panic: vrefact: wrong use count 0
> cpuid = 1
> time = 1742796535
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff8061eadb = db_trace_self_wrapper+0x2b/frame 
> 0xfffffe02476a0780
> kdb_backtrace() at 0xffffffff80956a57 = kdb_backtrace+0x37/frame 0xfffffe02476a0830
> vpanic() at 0xffffffff80907629 = vpanic+0x169/frame 0xfffffe02476a0970
> panic() at 0xffffffff80907403 = panic+0x43/frame 0xfffffe02476a09d0
> vrefact() at 0xffffffff809f08e4 = vrefact+0xb4/frame 0xfffffe02476a09f0
> fgetvp_lookup() at 0xffffffff808ac718 = fgetvp_lookup+0x88/frame 0xfffffe02476a0a30
> namei_setup() at 0xffffffff809e07ba = namei_setup+0x15a/frame 0xfffffe02476a0a80
> namei_emptypath() at 0xffffffff809e0499 = namei_emptypath+0x49/frame 
> 0xfffffe02476a0ae0
> namei() at 0xffffffff809e029f = namei+0x66f/frame 0xfffffe02476a0b40
> linux_kern_statat() at 0xffffffff8a09d24c = linux_kern_statat+0xfc/frame 
> 0xfffffe02476a0c70
> linux_newfstatat() at 0xffffffff8a09cfed = linux_newfstatat+0x6d/frame 
> 0xfffffe02476a0e00
> amd64_syscall() at 0xffffffff80c79f79 = amd64_syscall+0x189/frame 
> 0xfffffe02476a0f30
> fast_syscall_common() at 0xffffffff80c4fb9b = fast_syscall_common+0xf8/frame 
> 0xfffffe02476a0f30
> --- syscall (262, Linux ELF64, linux_newfstatat), rip = 0x813f13eee, rsp = 
> 0x7fffffffbd28, rbp = 0 ---
> 
> As far as I understand, there is a Linux fstatat system call with AT_EMPTY_PATH 
> flag and the file descriptor of opened /dev/drm/128 device.
> 
> Given that the crash happens in fgetvp_lookup -> vrefact, I think that it's 
> unlikely that there is a problem in that call path.
> I believe that the problem is elsewhere in the Linux emulation code for working 
> with character devices.
> 
> I think that the panic means that the corresponding file descriptor was open but 
> the associated vnode had usecount of zero.
> 
> It looks like DTYPE_DEV (11) is used only in the linuxkpi code, e.g., 
> linux_dev_fdopen.
> 
> Some info from kgdb.
> 
> (kgdb) bt
> #0  __curthread () at sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=1) at sys/kern/kern_shutdown.c:423
> #2  0xffffffff80907121 in kern_reboot (howto=260) at sys/kern/kern_shutdown.c:541
> #3  0xffffffff80907698 in vpanic (fmt=0xffffffff80e35cf8 "%s: wrong use count 
> %d", ap=0xfffffe01adc909b0) at sys/kern/kern_shutdown.c:1021
> #4  0xffffffff80907403 in panic (fmt=<unavailable>) at sys/kern/kern_shutdown.c:945
> #5  0xffffffff809f08e4 in vrefact (vp=0xfffff8035b4bb700) at sys/kern/ 
> vfs_subr.c:3361
> #6  0xffffffff808ac718 in fgetvp_lookup (ndp=ndp@entry=0xfffffe01adc90b58, 
> vpp=vpp@entry=0xfffffe01adc90ac8) at sys/kern/kern_descrip.c:3134
> #7  0xffffffff809e07ba in namei_setup (ndp=ndp@entry=0xfffffe01adc90b58, 
> dpp=dpp@entry=0xfffffe01adc90ac8, pwdp=pwdp@entry=0xfffffe01adc90ac0) at sys/ 
> kern/vfs_lookup.c:383
> #8  0xffffffff809e0499 in namei_emptypath (ndp=ndp@entry=0xfffffe01adc90b58) at 
> sys/kern/vfs_lookup.c:466
> #9  0xffffffff809e029f in namei (ndp=ndp@entry=0xfffffe01adc90b58) at sys/kern/ 
> vfs_lookup.c:687
> #10 0xffffffff8a09d24c in linux_kern_statat (td=0xfffff804d50d7000, flag=16384, 
> fd=9, path=0x813fd846f <error: Cannot access memory at address 0x813fd846f>, 
> pathseg=UIO_USERSPACE, sbp=sbp@entry=0xfffffe01adc90c80)
>      at sys/compat/linux/linux_stats.c:103
> #11 0xffffffff8a09cfed in linux_newfstatat (td=<unavailable>, td@entry=<error 
> reading variable: value is not available>, args=0xfffff804d50d7400, 
> args@entry=<error reading variable: value is not available>)
>      at sys/compat/linux/linux_stats.c:620
> #12 0xffffffff80c79f79 in syscallenter (td=0xfffff804d50d7000) at sys/amd64/ 
> amd64/../../kern/subr_syscall.c:191
> #13 amd64_syscall (td=0xfffff804d50d7000, traced=<optimized out>) at sys/amd64/ 
> amd64/trap.c:1206
> 
> (kgdb) p *vp
> $1 = {v_type = VCHR, v_state = VSTATE_CONSTRUCTED, v_irflag = 0, v_seqc = 0, 
> v_nchash = 1973399077, v_hash = 56314807, v_op = 0xffffffff8127b648 
> <devfs_specops>, v_data = 0xfffff80055005200, v_mount = 0xfffffe0150b46100, 
> v_nmntvnodes = {
>      tqe_next = 0xfffff8038000da80, tqe_prev = 0xfffff8035b4bb8e8}, 
> {v_mountedhere = 0xfffff800452b9400, v_unpcb = 0xfffff800452b9400, v_rdev = 
> 0xfffff800452b9400, v_fifoinfo = 0xfffff800452b9400}, v_hashlist = {le_next = 
> 0x0, le_prev = 0x0},
>    v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 
> 0xfffff8035b4bb758}, v_cache_dd = 0x0, v_lock = {lock_object = {lo_name = 
> 0xffffffff80d1cf3c "devfs", lo_flags = 116588544, lo_data = 0, lo_witness = 0x0},
>      lk_lock = 1, lk_exslpfail = 0, lk_pri = 64, lk_timo = 51}, v_interlock = 
> {lock_object = {lo_name = 0xffffffff80db24c1 "vnode interlock", lo_flags = 
> 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 
> 0xfffff8035b4bb770,
>    v_vnodelist = {tqe_next = 0xfffff8035b4bbc40, tqe_prev = 0xfffff80369f48280}, 
> v_lazylist = {tqe_next = 0x0, tqe_prev = 0x0}, v_bufobj = {bo_lock = 
> {lock_object = {lo_name = 0xffffffff80df4394 "bufobj interlock", lo_flags = 
> 86179840,
>          lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 
> 0xffffffff812b7190 <buf_ops_bio>, bo_object = 0x0, bo_synclist = {le_next = 0x0, 
> le_prev = 0x0}, bo_private = 0xfffff8035b4bb700, bo_clean = {bv_hd = {tqh_first 
> = 0x0,
>          tqh_last = 0xfffff8035b4bb828}, bv_root = {pt_root = 0x1}, bv_cnt = 0}, 
> bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb848}, bv_root = 
> {pt_root = 0x1}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_domain = 0,
>      bo_bsize = 512}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = 
> {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff8035b4bb890}, rl_currdep = 
> 0x0}, v_holdcnt = 32, v_usecount = 1, v_iflag = 0, v_vflag = 0, v_mflag = 0,
>    v_dbatchcpu = -1, v_writecount = 0, v_seqc_users = 0}
> 
> (kgdb) p *fp
> $3 = {f_flag = 3, f_count = 3, f_data = 0xfffff807120b5480, f_ops = 
> 0xffffffff84b46390 <linuxfileops>, f_vnode = 0xfffff8035b4bb700, f_cred = 
> 0xfffff8036f967d00, f_type = 11, f_vnread_flags = 0, {f_seqcount = {0, 0}, 
> f_pipegen = 0},
>    f_nextoff = {0, 0}, f_vnun = {fvn_cdevpriv = 0x0, fvn_advice = 0x0}, f_offset 
> = 0}
> 
> (kgdb) p *ndp
> $5 = {ni_dirp = 0x813fd846f <error: Cannot access memory at address 
> 0x813fd846f>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xffffffff812005f0 
> <cap_fstat_rights>, ni_startdir = 0x0, ni_rootdir = 0xfffff8003a922c40,
>    ni_topdir = 0xfffff8003a922c40, ni_dirfd = 9, ni_lcf = 0, ni_filecaps = 
> {fc_rights = {cr_rights = {144123984168878079, 288230376153808895}}, fc_ioctls = 
> 0x0, fc_nioctls = -1, fc_fcntls = 120}, ni_vp = 0x0, ni_dvp = 0xffffffffffffffff,
>    ni_resflags = 4, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next = 
> 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, 
> ni_cnd = {cn_flags = 262596, cn_cred = 0xfffff8068c56b200, cn_nameiop = LOOKUP,
>      cn_lkflags = -1, cn_pnbuf = 0xfffff8002b11ec00 "", cn_nameptr = 
> 0xfffff8002b11ec00 "", cn_namelen = -1}, ni_cap_tracker = {tqh_first = 0x0, 
> tqh_last = 0xfffffe01adc90c08}, ni_dvp_seqc = 2915634432, ni_vp_seqc = 4294966785}
> 
> I tried to look at linux_dev_fdopen() and other code in sys/compat/linuxkpi/ 
> common/src/linux_compat.c, but couldn't make much progress yet.
> 
> I have the crash dump, so if there is anything else I can provide or look at...
> 
> Thank you.


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d3b0a784-dc4b-4b02-a158-ca70d7b3ce96>