Date: Tue, 14 May 2019 13:56:18 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: freebsd-net@FreeBSD.org Subject: Re: crash in dummynet, drain_scheduler_cb, si->sched == NULL Message-ID: <e1768931-8187-d39d-4eca-f8bc1453f0a8@FreeBSD.org> In-Reply-To: <60237dd7-e507-ecf3-c9b0-ef3c4c2116ad@FreeBSD.org> References: <60237dd7-e507-ecf3-c9b0-ef3c4c2116ad@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Just fixing the subject. On 14/05/2019 12:17, Andriy Gapon wrote: > > Unfortunately, all we have is some information from a ddb text dump. We do not > have a vmcore and we do not have a way to re-create the crash. It happened just > once on a production system. > > So, the information follows. > > dn_enqueue fs 0 si 0, dropping > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x60 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff8077bdff > stack pointer = 0x28:0xfffffe1096343910 > frame pointer = 0x28:0xfffffe1096343920 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (dummynet) > > db:3:psinfo> bt > Tracing pid 0 tid 100248 td 0xfffff8002829d4d0 > stack1 drain_scheduler_cb+0x1f drain_scheduler_sch_cb+0x25 > dn_ht_scan_bucket+0x7a dn_drain_scheduler+0x20 dummynet_task+0x219 > taskqueue_run_locked+0x71 taskqueue_thread_loop+0x56 fork_exit+0x121 > fork_trampoline+0xe > > drain_scheduler_cb+0x1f: movq 0x60(%rdx),%rax > > Here is disassembly of the function with some notes of mine: > 0xffffffff8077bde0 <+0>: push %rbp > 0xffffffff8077bde1 <+1>: testb $0x20,0x90(%rdi) // test DN_ACTIVE > 0xffffffff8077bde8 <+8>: mov %rsp,%rbp > 0xffffffff8077bdeb <+11>: jne 0xffffffff8077bdf4 <drain_scheduler_cb+20> > 0xffffffff8077bded <+13>: cmpq $0x0,0x78(%rdi) > 0xffffffff8077bdf2 <+18>: je 0xffffffff8077bdf8 <drain_scheduler_cb+24> > 0xffffffff8077bdf4 <+20>: leaveq > 0xffffffff8077bdf5 <+21>: xor %eax,%eax > 0xffffffff8077bdf7 <+23>: retq > 0xffffffff8077bdf8 <+24>: mov 0x88(%rdi),%rdx // rdx = si->sched > 0xffffffff8077bdff <+31>: mov 0x60(%rdx),%rax // rax = si->sched->fp > 0xffffffff8077be03 <+35>: testb $0x1,0x10(%rax) > > So, it seems that dummynet ran into dn_sch_inst with sched field being NULL. > > I am not sure how that could be possible. > Also, I am not sure if that "dn_enqueue ..." message is related to the crash. > > Does anyone have any ideas? > Thank you. > > P.S. > I found a somewhat similar but different and very old report: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166937 > It seems that it was not really root-caused and fixed, but marked as fixed > because of a chance that it could have been caused by flaky hardware. > -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e1768931-8187-d39d-4eca-f8bc1453f0a8>