Date: Mon, 13 Dec 2021 07:45:07 -0800 From: John Baldwin <jhb@FreeBSD.org> To: Gleb Smirnoff <glebius@FreeBSD.org> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, x11@FreeBSD.org Subject: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore Message-ID: <1db0942e-0e66-4337-ce2f-4e1005107435@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
This weekend I upgraded my FreeBSD laptop and kicked off a poudriere build of the packages I use. My laptop kept "freezing" during the package builds however. Initially due to messages in /var/log/messages I thought it was running out of swap and killing the display server. After poking it at off and on over the weekend I finally narrowed it down to building the devel/apr1 port, and built it on the console (rather than X) and was greeted with the following panic: panic: malloc(M_WAITOK) with sleeping prohibited cpuid = 7 time = 1639374072 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001e5b55b0 vpanic() at vpanic+0x17f/frame 0xfffffe001e5b5600 panic() at panic+0x43/frame 0xfffffe001e5b5660 malloc_dbg() at malloc_dbg+0xd4/frame 0xfffffe001e5b5680 malloc() at malloc+0x2d/frame 0xfffffe001e5b56c0 intel_atomic_state_alloc() at intel_atomic_state_alloc+0x20/frame 0xfffffe001e5b56e0 drm_client_modeset_commit_atomic() at drm_client_modeset_commit_atomic+0x30/frame 0xfffffe001e5b5750 drm_client_modeset_commit_force() at drm_client_modeset_commit_force+0x6f/frame 0xfffffe001e5b5790 drm_fb_helper_restore_fbdev_mode_unlocked() at drm_fb_helper_restore_fbdev_mode_unlocked+0x82/frame 0xfffffe001e5b57c0 vt_kms_postswitch() at vt_kms_postswitch+0x18b/frame 0xfffffe001e5b57f0 vt_window_switch() at vt_window_switch+0x261/frame 0xfffffe001e5b5830 vtterm_cngrab() at vtterm_cngrab+0x4f/frame 0xfffffe001e5b5850 cngrab() at cngrab+0x26/frame 0xfffffe001e5b5870 vpanic() at vpanic+0xee/frame 0xfffffe001e5b58c0 panic() at panic+0x43/frame 0xfffffe001e5b5920 witness_checkorder() at witness_checkorder+0xd1c/frame 0xfffffe001e5b5ae0 __mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe001e5b5b30 prison_check_ip4() at prison_check_ip4+0x51/frame 0xfffffe001e5b5b60 in_pcblookup_hash_locked() at in_pcblookup_hash_locked+0x2b6/frame 0xfffffe001e5b5bc0 in_pcblookup_mbuf() at in_pcblookup_mbuf+0x84/frame 0xfffffe001e5b5c00 tcp_input_with_port() at tcp_input_with_port+0x635/frame 0xfffffe001e5b5d50 tcp_input() at tcp_input+0xb/frame 0xfffffe001e5b5d60 ip_input() at ip_input+0x25e/frame 0xfffffe001e5b5de0 swi_net() at swi_net+0x1a1/frame 0xfffffe001e5b5e60 ithread_loop() at ithread_loop+0x279/frame 0xfffffe001e5b5ef0 fork_exit() at fork_exit+0x80/frame 0xfffffe001e5b5f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001e5b5f30 --- trap 0x61e8cb8b, rip = 0x8b48000000f890ff, rsp = 0x52ff38244c8d4800, rbp = 0x245c8948ccc35f20 --- So there are two things here. The root issue is that the devel/apr1 port runs a configure test for TCP_NDELAY being inherited by accepted sockets. This test panics because prison_check_ip4() tries to lock a prison mutex to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has done an smr_enter() which is a critical_enter(): (kgdb) p panicstr $1 = 0xffffffff81ea90b0 <vpanic.buf> "acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) jail mutex @ /usr/src/sys/netinet/in_jail.c:418" (kgdb) frame 39 #39 0xffffffff80dbcf71 in prison_check_ip4 (cred=<optimized out>, ia=ia@entry=0xfffffe001e5b5b90) at /usr/src/sys/netinet/in_jail.c:418 418 mtx_lock(&pr->pr_mtx); (kgdb) l 413 KASSERT(ia != NULL, ("%s: ia is NULL", __func__)); 414 415 pr = cred->cr_prison; 416 if (!(pr->pr_flags & PR_IP4)) 417 return (0); 418 mtx_lock(&pr->pr_mtx); 419 if (!(pr->pr_flags & PR_IP4)) { 420 mtx_unlock(&pr->pr_mtx); 421 return (0); 422 } (kgdb) up #41 0xffffffff80dc5cb4 in in_pcblookup_hash (pcbinfo=0xfffffe0022db7748, faddr=..., fport=2166892021, laddr=..., lport=0, lookupflags=<optimized out>, numa_domain=56 '8', ifp=<optimized out>) at /usr/src/sys/netinet/in_pcb.c:2387 2387 inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport, (kgdb) l 2382 struct ifnet *ifp, uint8_t numa_domain) 2383 { 2384 struct inpcb *inp; 2385 2386 smr_enter(pcbinfo->ipi_smr); 2387 inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport, 2388 lookupflags & INPLOOKUP_WILDCARD, ifp, numa_domain); 2389 if (inp != NULL) { 2390 if (__predict_false(inp_smr_lock(inp, 2391 (lookupflags & INPLOOKUP_LOCKMASK)) == false)) However, it was a bit harder to see this originally as the 915kms driver tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a bad idea). When it panicked in X the result was that the screen just froze on whatever it had most recently drawn and the machine looked hung. (The fact that that sysbeep is off so I couldn't tell if typing in commands was doing anything vs emitting errors probably didn't improve trying to diagnose the hang as "sitting in ddb" initially, though I don't know if DDB itself emits a beep for invalid commands, etc.) -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1db0942e-0e66-4337-ce2f-4e1005107435>