Date: Mon, 27 Dec 2021 13:43:01 -0500 From: Alexander Motin <mav@FreeBSD.org> To: Gleb Smirnoff <glebius@freebsd.org>, Larry Rosenman <ler@lerctr.org> Cc: current@freebsd.org Subject: Re: My -CURRENT crashes.... Message-ID: <45ee5689-b24c-51b5-d7b7-33fd73ee7dce@FreeBSD.org> In-Reply-To: <Ycn4Y7ZUE%2BBWM3tr@FreeBSD.org> References: <286c830efc0e12e3e7a7e9b2ede31c28@lerctr.org> <Ycn4Y7ZUE%2BBWM3tr@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 27.12.2021 12:31, Gleb Smirnoff wrote: > On Fri, Dec 17, 2021 at 01:27:11PM -0600, Larry Rosenman wrote: > L> Can someone look at the messages I posted to -CURRENT, most recent > L> today, with random > L> Callout(?) crashes after long (>6 hour) poudriere runs? > L> > L> I have core's available. > > I asked Larry to obtain a core with INVARIANTS and now we have one. > > Sharing what I've found to brainstorm. Trap happens in LIST_REMOVE() > kern_timeout.c:488 because the entry doesn't have a prev pointer, e.g. > doesn't belong to any list. > > #6 0xffffffff807be075 in trap_pfault (frame=0xfffffe02d3393d50, usermode=false, signo=<optimized out>, ucode=<optimized out>) > at /usr/src/sys/amd64/amd64/trap.c:765 > #7 <signal handler called> > #8 0xffffffff804e5609 in callout_process (now=now@entry=100465191785818) at /usr/src/sys/kern/kern_timeout.c:488 > #9 0xffffffff80460fc5 in handleevents (now=now@entry=100465191785818, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 > #10 0xffffffff80461a66 in timercb (et=0xffffffff80d47980 <lapic_et>, arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:357 > #11 0xffffffff807e6beb in lapic_handle_timer (frame=0xfffffe02d3393f40) at /usr/src/sys/x86/x86/local_apic.c:1364 > > (kgdb) p *tmp > $13 = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, > c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0xfffff8030521e670, c_flags = 0, c_iflags = 0, c_cpu = 0} > > Useful here is the c_lock, which points into "process lock" lockobject. > > This allows us to deduct that the callout belongs to proc subsystem and > we can retrieve the proc it points to: c_lock - 0x128 = 0xfffff8030521e548 > It is ccache in PRS_NORMAL state. And the "tmp" in our stack frame is its > p_itcallout. > > So there is something that would zero out most of the p_itcallout while > it is scheduled? So carefully zero it, but keep the lock pointer... The only way that comes to mind is callout_init_mtx() in do_fork() if we assume the process has completed and the struct proc was reused. I guess if we could somehow leak scheduled callout in exit1(). May be we could add some more assertions to try catch callout still being active there. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45ee5689-b24c-51b5-d7b7-33fd73ee7dce>