Date: Mon, 14 Dec 2020 17:53:17 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 251840] panic in iflib_netdump_poll -> _iflib_fl_refill Message-ID: <bug-251840-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D251840 Bug ID: 251840 Summary: panic in iflib_netdump_poll -> _iflib_fl_refill Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: mammoottym@yahoo.com When there is heavy network traffic on a system most of the cpus can be processing rx packets in the interrupt task. Now if the node crashes or we break into debugger, most of the cpus will be stopped while running _task_fn_rx. Running netdump in this state will make it to go though the sa= me queues that was partially processed in the _task_fn_rx. This might cause multiple issues as explained below. I was able to reproduce some of the the issues easily by panicing the node while running multiple iperf threads. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D 1960 _iflib_fl_refill(if_ctx_t ctx, iflib_fl_t fl, int count) 1961 { .... 1972 sd_m =3D fl->ifl_sds.ifsd_m; 1973 sd_map =3D fl->ifl_sds.ifsd_map; .... 1976 pidx =3D fl->ifl_pidx; 1977 idx =3D pidx; 1978 frag_idx =3D fl->ifl_fragidx; 1979 credits =3D fl->ifl_credits; .... 1982 n =3D count; .... 1997 while (n--) { .... 2007 bit_ffc(fl->ifl_rx_bitmap, fl->ifl_size, &frag_idx); 2008 MPASS(frag_idx >=3D 0); 2009 if ((cl =3D sd_cl[frag_idx]) =3D=3D NULL) { 2010 cl =3D m_cljget(NULL, M_NOWAIT, fl->ifl_buf_size); .... 2015 MPASS(sd_map !=3D NULL); .... 2025 sd_cl[frag_idx] =3D cl; .... 2031 } .... 2035 MPASS(sd_m[frag_idx] =3D=3D NULL); 2036 m =3D m_gethdr(M_NOWAIT, MT_NOINIT); .... 2039 sd_m[frag_idx] =3D m; 2040 bit_set(fl->ifl_rx_bitmap, frag_idx); .... 2049 credits++; 2051 idx++; .... 2060 if (n =3D=3D 0 || i =3D=3D IFLIB_MAX_RX_REFRESH) { .... 2064 fl->ifl_pidx =3D idx; 2065 fl->ifl_credits =3D credits; 2066 } 2067 } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The above function _iflib_fl_refill() is called to refill rxq free buffer l= ist with new packet buffers. The number of new buffers to fill is passed in as input parameter, count. Callers make sure that the count does not exceed the queue's capacity, by checking 'count < (fl->if_use - fl->ifl_credits)'. The= re is a bitmap to indicale which list is free to allocate. As show in the code snippet above, the allocation is done in a while loop, lines 1997-2067. After finding an available entry from the bit map, buffer= is allocated for it, bit in the map is set to indicate it is more available to fill. This is repeated for all the n counts. Once allocation of all the required number of buffers are done fl->ifl_credits is bumped by the numbe= r of buffers allocated at line 2065. Suppose one of the CPU is running the above loop, it has done some allocati= ons, set the bits in the bitmap to indicate that those are no more available. Bu= t it has not completed the loop to set the ifl_credits field. Now suppose the no= de crashes, stops the cpu, runs netdump. Netdump will poll for the packets and= in the Rx context will check the rxq to find that there are free buffer list to fill by looking at the ifl->credits field. While looking for the available bits in the bitmap, it might find that there are = no more bits available, resulting in Assertion failure MPASS(frag_idx >=3D 0) = at line 2008. We may be able to workaround this particular issue by modifying = the code, but that might impact the performance during the normal reception. Th= is was the assertion failure found by the cert team, when they opened this bug. ---------------------------------------------------------------------------= -- #0 wbinvd () at ./machine/cpufunc.h:417 #1 cpustop_handler () at /b/mnt/src/sys/x86/x86/mp_x86.c:1424 #2 0xffffffff80a42214 in ipi_nmi_handler () at /b/mnt/src/sys/x86/x86/mp_x86.c:1363 #3 0xffffffff809cc693 in trap (frame=3D0xfffffe9d4edfaf30) at /b/mnt/src/sys/amd64/amd64/trap.c:210 #4 nmi_calltrap () at /b/mnt/src/sys/amd64/amd64/exception.S:792 #5 item_ctor (zone=3D0xfffffea265fab600, uz_flags=3D<optimized out>, size=3D<optimized out>, udata=3D<optimized out>, flags=3D1, item=3D0xfffff80057b18800) at /b/mnt/src/sys/vm/uma_core.c:3269 #6 0xffffffff8095a68d in cache_alloc_item (zone=3D<optimized out>, cache=3D<optimized out>, udata=3D<optimized out>, flags=3D<optimized out>, bucket=3D<optimized out>) at /b/mnt/src/sys/vm/uma_core.c:3395 #7 uma_zalloc_arg (zone=3D0xfffffea265fab600, udata=3D0x0, flags=3D1) at /b/mnt/src/sys/vm/uma_core.c:3491 #8 0xffffffff80610da0 in m_cljget (m=3D0x0, how=3D1, size=3D2048) at /b/mnt/src/sys/kern/kern_mbuf.c:1020 #9 0xffffffff80763111 in _iflib_fl_refill (ctx=3D0xfffff802f1c56c00, fl=3D0xfffff802f1c56400, count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2010 #10 0xffffffff8076296b in __iflib_fl_refill_lt (ctx=3D<optimized out>, max= =3D24, fl=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2108 #11 iflib_rxeof (rxq=3D<optimized out>, budget=3D24) at /b/mnt/src/sys/net/iflib.c:2802 #12 0xffffffff8075e5b9 in _task_fn_rx (context=3D0xfffffea2665bffa0) at /b/mnt/src/sys/net/iflib.c:3778 ... #0 kdb_enter (why=3D0xffffffff80b7e201 "panic", msg=3D<optimized out>) at /b/mnt/src/sys/kern/subr_kdb.c:483 #1 0xffffffff80634c27 in panic_finish () at /b/mnt/src/sys/kern/kern_shutdown.c:1154 #2 0xffffffff806345be in panic (fmt=3D<optimized out>) at /b/mnt/src/sys/kern/kern_shutdown.c:947 #3 0xffffffff807634d3 in _iflib_fl_refill (ctx=3D0xfffff802f1c56c00, fl=3D0xfffff802f1c56400, count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2008 #4 0xffffffff8076285b in __iflib_fl_refill_lt (ctx=3D<optimized out>, max= =3D24, fl=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2108 #5 iflib_rxeof (rxq=3D0xfffffea2665bffa0, budget=3D24) at /b/mnt/src/sys/net/iflib.c:2741 #6 0xffffffff80761ff4 in iflib_netdump_poll (ifp=3D<optimized out>, count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:6591 ---------------------------------------------------------------------- A second scenario that can happen is: Suppose one of the CPU is running the ablve loop and has done the allocation and assignment, sd_m[frag_idx] =3D m= , at line 2039. Now suppose the node crashes, stops the cpu, runs netdump. Netdu= mp in the Rx poll path will come to this function, see this bit is available in the bit map and do the assertion MPASS(sd_m[frag_idx] =3D=3D NULL) at line = 2035 before doing the memory allocation. This assertion will fail since the allocation and assignment were done before the cpu got NMI. ---------------------------------------------------------------------- [Switching to thread 730 (Thread 100048)] #0 wbinvd () at ./machine/cpufunc.h:417 417 in ./machine/cpufunc.h (gdb) bt #0 wbinvd () at ./machine/cpufunc.h:417 #1 cpustop_handler () at /b/mnt/src/sys/x86/x86/mp_x86.c:1424 #2 0xffffffff80a42214 in ipi_nmi_handler () at /b/mnt/src/sys/x86/x86/mp_x86.c:1363 #3 0xffffffff809cc693 in trap (frame=3D0xffffffff813e6920 <nmi0_stack+3888>) at /b/mnt/src/sys/amd64/amd64/trap.c:210 #4 nmi_calltrap () at /b/mnt/src/sys/amd64/amd64/exception.S:792 #5 0xffffffff80763370 in _bit_mask (_bit=3D<optimized out>) at /b/mnt/src/sys/sys/bitstring.h:104 #6 bit_set (_bitstr=3D0xfffff802f1d23e00, _bit=3D<optimized out>) at /b/mnt/src/sys/sys/bitstring.h:148 #7 _iflib_fl_refill (ctx=3D0xfffff802f1cf6800, fl=3D0xfffff802f1cf6400, count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2040 #8 0xffffffff80762a2b in __iflib_fl_refill_lt (ctx=3D<optimized out>, max= =3D24, fl=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2108 #9 iflib_rxeof (rxq=3D<optimized out>, budget=3D24) at /b/mnt/src/sys/net/iflib.c:2802 #10 0xffffffff8075e679 in _task_fn_rx (context=3D0xfffffe9deef3c9c0) at /b/mnt/src/sys/net/iflib.c:3778 .... [Switching to thread 1 (Thread 100659)] #0 kdb_enter (why=3D0xffffffff80b7e20f "panic", msg=3D<optimized out>) at /b/mnt/src/sys/kern/subr_kdb.c:483 483 kdb_why =3D KDB_WHY_UNSET; (gdb) bt #0 kdb_enter (why=3D0xffffffff80b7e20f "panic", msg=3D<optimized out>) at /b/mnt/src/sys/kern/subr_kdb.c:483 #1 0xffffffff80634c67 in panic_finish () at /b/mnt/src/sys/kern/kern_shutdown.c:1154 #2 0xffffffff806345fe in panic (fmt=3D<optimized out>) at /b/mnt/src/sys/kern/kern_shutdown.c:947 #3 0xffffffff807635ba in _iflib_fl_refill (ctx=3D0xfffff802f1cf6800, fl=3D0xfffff802f1cf6400, {{ count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2035}} #4 0xffffffff80762a2b in __iflib_fl_refill_lt (ctx=3D<optimized out>, max= =3D24, fl=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:2108 #5 iflib_rxeof (rxq=3D<optimized out>, budget=3D24) at /b/mnt/src/sys/net/iflib.c:2802 #6 0xffffffff807620b4 in iflib_netdump_poll (ifp=3D<optimized out>, count=3D<optimized out>) at /b/mnt/src/sys/net/iflib.c:6594 ---------------------------------------------------------------------------= --- In order to avoid multithreading problems in the shutdown path we shutdown = CPUs and allow only one cpu to run the netdump. We also disable interrupt on the single running CPU.That=E2=80=99s why netdump works with polling rx queues = for received packets. Since the same network stack is used by netdump to communicate with the net= dump server, netdump expects the stack to be in a sane state for netdump to perf= orm tx/rx. After a panic we can=E2=80=99t trust a system 100%, netdump runs as best ef= fort. It is unfortunately possible to hit these sort of issues given that the pan= ic can happen when cpus were in the network stack. Since on panic we only have a single CPU that works with all the others sto= pped through NMI, it can also cause other types of breakages like a CPU stopped while the thread running on it owns a spinlock, for example, that is needed= to complete netdump. This bug is opened to investigate further and see if we can address some of these issues. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-251840-227>