Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 May 2019 13:56:18 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-net@FreeBSD.org
Subject:   Re: crash in dummynet, drain_scheduler_cb, si->sched == NULL
Message-ID:  <e1768931-8187-d39d-4eca-f8bc1453f0a8@FreeBSD.org>
In-Reply-To: <60237dd7-e507-ecf3-c9b0-ef3c4c2116ad@FreeBSD.org>
References:  <60237dd7-e507-ecf3-c9b0-ef3c4c2116ad@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Just fixing the subject.

On 14/05/2019 12:17, Andriy Gapon wrote:
> 
> Unfortunately, all we have is some information from a ddb text dump.  We do not
> have a vmcore and we do not have a way to re-create the crash.  It happened just
> once on a production system.
> 
> So, the information follows.
> 
> dn_enqueue fs 0 si 0, dropping
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x60
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff8077bdff
> stack pointer           = 0x28:0xfffffe1096343910
> frame pointer           = 0x28:0xfffffe1096343920
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (dummynet)
> 
> db:3:psinfo> bt
> Tracing pid 0 tid 100248 td 0xfffff8002829d4d0
>  stack1 drain_scheduler_cb+0x1f drain_scheduler_sch_cb+0x25
> dn_ht_scan_bucket+0x7a dn_drain_scheduler+0x20 dummynet_task+0x219
> taskqueue_run_locked+0x71 taskqueue_thread_loop+0x56 fork_exit+0x121
> fork_trampoline+0xe
> 
> drain_scheduler_cb+0x1f:        movq    0x60(%rdx),%rax
> 
> Here is disassembly of the function with some notes of mine:
>    0xffffffff8077bde0 <+0>:     push   %rbp
>    0xffffffff8077bde1 <+1>:     testb  $0x20,0x90(%rdi) // test DN_ACTIVE
>    0xffffffff8077bde8 <+8>:     mov    %rsp,%rbp
>    0xffffffff8077bdeb <+11>:    jne    0xffffffff8077bdf4 <drain_scheduler_cb+20>
>    0xffffffff8077bded <+13>:    cmpq   $0x0,0x78(%rdi)
>    0xffffffff8077bdf2 <+18>:    je     0xffffffff8077bdf8 <drain_scheduler_cb+24>
>    0xffffffff8077bdf4 <+20>:    leaveq
>    0xffffffff8077bdf5 <+21>:    xor    %eax,%eax
>    0xffffffff8077bdf7 <+23>:    retq
>    0xffffffff8077bdf8 <+24>:    mov    0x88(%rdi),%rdx // rdx = si->sched
>    0xffffffff8077bdff <+31>:    mov    0x60(%rdx),%rax // rax = si->sched->fp
>    0xffffffff8077be03 <+35>:    testb  $0x1,0x10(%rax)
> 
> So, it seems that dummynet ran into dn_sch_inst with sched field being NULL.
> 
> I am not sure how that could be possible.
> Also, I am not sure if that "dn_enqueue ..." message is related to the crash.
> 
> Does anyone have any ideas?
> Thank you.
> 
> P.S.
> I found a somewhat similar but different and very old report:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166937
> It seems that it was not really root-caused and fixed, but marked as fixed
> because of a chance that it could have been caused by flaky hardware.
> 


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e1768931-8187-d39d-4eca-f8bc1453f0a8>