Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 May 2019 08:56:29 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 238017] crash in dummynet, drain_scheduler_cb, si->sched == NULL
Message-ID:  <bug-238017-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D238017

            Bug ID: 238017
           Summary: crash in dummynet, drain_scheduler_cb, si->sched =3D=3D
                    NULL
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: avg@FreeBSD.org

Unfortunately, all we have is some information from a ddb text dump.  We do=
 not
have a vmcore and we do not have a way to re-create the crash.  It happened
just
once on a production system.

So, the information follows.

dn_enqueue fs 0 si 0, dropping


Fatal trap 12: page fault while in kernel mode
cpuid =3D 0; apic id =3D 00
fault virtual address   =3D 0x60
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff8077bdff
stack pointer           =3D 0x28:0xfffffe1096343910
frame pointer           =3D 0x28:0xfffffe1096343920
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 0 (dummynet)

db:3:psinfo> bt
Tracing pid 0 tid 100248 td 0xfffff8002829d4d0
 stack1 drain_scheduler_cb+0x1f drain_scheduler_sch_cb+0x25
dn_ht_scan_bucket+0x7a dn_drain_scheduler+0x20 dummynet_task+0x219
taskqueue_run_locked+0x71 taskqueue_thread_loop+0x56 fork_exit+0x121
fork_trampoline+0xe

drain_scheduler_cb+0x1f:        movq    0x60(%rdx),%rax

Here is disassembly of the function with some notes of mine:
   0xffffffff8077bde0 <+0>:     push   %rbp
   0xffffffff8077bde1 <+1>:     testb  $0x20,0x90(%rdi) // test DN_ACTIVE
   0xffffffff8077bde8 <+8>:     mov    %rsp,%rbp
   0xffffffff8077bdeb <+11>:    jne    0xffffffff8077bdf4
<drain_scheduler_cb+20>
   0xffffffff8077bded <+13>:    cmpq   $0x0,0x78(%rdi)
   0xffffffff8077bdf2 <+18>:    je     0xffffffff8077bdf8
<drain_scheduler_cb+24>
   0xffffffff8077bdf4 <+20>:    leaveq
   0xffffffff8077bdf5 <+21>:    xor    %eax,%eax
   0xffffffff8077bdf7 <+23>:    retq
   0xffffffff8077bdf8 <+24>:    mov    0x88(%rdi),%rdx // rdx =3D si->sched
   0xffffffff8077bdff <+31>:    mov    0x60(%rdx),%rax // rax =3D si->sched=
->fp
   0xffffffff8077be03 <+35>:    testb  $0x1,0x10(%rax)

So, it seems that dummynet ran into dn_sch_inst with sched field being NULL.

I am not sure how that could be possible.
Also, I am not sure if that "dn_enqueue ..." message is related to the cras=
h.

I found a somewhat similar but different and very old report: bug 166937.
It seems that it was not really root-caused and fixed, but marked as fixed
because of a chance that it could have been caused by flaky hardware.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-238017-227>