Date: Mon, 6 Aug 2018 09:21:26 +0200 From: Hans Petter Selasky <hps@selasky.org> To: Matthew Macy <mmacy@freebsd.org>, Roman Bogorodskiy <novel@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: panic after ifioctl/if_clone_destroy Message-ID: <a03803e6-5f1e-1960-c6a1-c7477f0ac9d4@selasky.org> In-Reply-To: <CAPrugNqVUoP0V8%2ByKTbCZgMoDu22xvCfUuga2LbKabjyi_=__A@mail.gmail.com> References: <20180805153556.GA1957@kloomba> <CAPrugNqVUoP0V8%2ByKTbCZgMoDu22xvCfUuga2LbKabjyi_=__A@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I think the problem is the thread pointed to by tdwait exited. I would say it is not allowed to peek into the other records threads, because they may change under the hood and are not protected by the current context. > if (record->er_cpuid != curcpu) { This optimisation is invalid or needs to be revisited: > /* > * If the head of the list is running, we can wait for it > * to remove itself from the list and thus save us the > * overhead of a migration > */ > if ((tdwait = TAILQ_FIRST(&record->er_tdlist)) != NULL && > TD_IS_RUNNING(tdwait->et_td)) { > gen = record->er_gen; > thread_unlock(td); > do { > cpu_spinwait(); > } while (tdwait == TAILQ_FIRST(&record->er_tdlist) && > gen == record->er_gen && TD_IS_RUNNING(tdwait->et_td) && > spincount++ < MAX_ADAPTIVE_SPIN); > thread_lock(td); > return; > } --HPS On 08/05/18 22:01, Matthew Macy wrote: > If you could give me a self-contained reproducer that would expedite a fix. > > Thanks. > -M > > On Sun, Aug 5, 2018 at 08:36 Roman Bogorodskiy <novel@freebsd.org> wrote: > >> Running -CURRENT r336863 on amd64. Get the following panic right after >> (or during) boot: >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 04 >> fault virtual address = 0xdeadc2ff >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80bd7858 >> stack pointer = 0x28:0xfffffe008b445580 >> frame pointer = 0x28:0xfffffe008b4455c0 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 903 (libvirtd) >> >> Traceback is: >> >> (kgdb) #0 doadump (textdump=0) at pcpu.h:230 >> #1 0xffffffff8043dc7b in db_dump (dummy=<value optimized out>, >> dummy2=<value optimized out>, dummy3=<value optimized out>, >> dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:574 >> #2 0xffffffff8043da49 in db_command (cmd_table=<value optimized out>) >> at /usr/src/sys/ddb/db_command.c:481 >> #3 0xffffffff8043d7c4 in db_command_loop () >> at /usr/src/sys/ddb/db_command.c:534 >> #4 0xffffffff804409ef in db_trap (type=<value optimized out>, >> code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:252 >> #5 0xffffffff80bdd513 in kdb_trap (type=12, code=0, tf=<value optimized >> out>) >> at /usr/src/sys/kern/subr_kdb.c:693 >> #6 0xffffffff810769f1 in trap_fatal (frame=0xfffffe008b4454c0, >> eva=3735929599) >> at /usr/src/sys/amd64/amd64/trap.c:884 >> #7 0xffffffff81076b12 in trap_pfault (frame=0xfffffe008b4454c0, >> usermode=<value optimized out>) at pcpu.h:230 >> #8 0xffffffff8107611a in trap (frame=0xfffffe008b4454c0) >> at /usr/src/sys/amd64/amd64/trap.c:427 >> #9 0xffffffff810518ac in calltrap () >> at /usr/src/sys/amd64/amd64/exception.S:230 >> #10 0xffffffff80bd7858 in epoch_block_handler_preempt ( >> global=<value optimized out>, cr=0xfffffe00760c3a00, >> arg=<value optimized out>) at /usr/src/sys/kern/subr_epoch.c:256 >> #11 0xffffffff803994fd in ck_epoch_synchronize_wait ( >> global=0xfffff800030c5680, >> cb=0xffffffff80bd77a0 <epoch_block_handler_preempt>, ct=0x0) >> at /usr/src/sys/contrib/ck/src/ck_epoch.c:407 >> #12 0xffffffff80bd7630 in epoch_wait_preempt (epoch=0xfffff800030c5680) >> at /usr/src/sys/kern/subr_epoch.c:389 >> #13 0xffffffff80c983bf in if_delgroup (ifp=0xfffff80003aab800, >> groupname=0xfffff80005ff5e00 "bridge") at /usr/src/sys/net/if.c:1514 >> #14 0xffffffff80c9f2b2 in if_clone_destroyif (ifc=0xfffff80005ff5e00, >> ifp=0xfffff80003aab800) at /usr/src/sys/net/if_clone.c:325 >> #15 0xffffffff80c9f0d5 in if_clone_destroy (name=0xfffffe008b4458d0 >> "virbr0") >> at /usr/src/sys/net/if_clone.c:288 >> #16 0xffffffff80c9a2c3 in ifioctl (so=0xfffff80007edca38, cmd=2149607801, >> data=<value optimized out>, td=<value optimized out>) >> at /usr/src/sys/net/if.c:3053 >> #17 0xffffffff80c04259 in kern_ioctl (td=0xfffff80007c1a580, >> fd=<value optimized out>, com=<value optimized out>, >> data=<value optimized out>) at file.h:330 >> #18 0xffffffff80c03f2e in sys_ioctl (td=0xfffff80007c1a580, >> uap=0xfffff80007c1a940) at /usr/src/sys/kern/sys_generic.c:712 >> #19 0xffffffff81077401 in amd64_syscall (td=0xfffff80007c1a580, traced=0) >> at subr_syscall.c:135 >> #20 0xffffffff8105218d in fast_syscall_common () >> at /usr/src/sys/amd64/amd64/exception.S:500 >> #21 0x00000008028f4c0a in ?? () >> >> >> Previous frame inner to this frame (corrupt stack?) >> >> >> Current language: auto; currently minimal >> >> >> (kgdb) >> >> It looks like panic happens during network interfaces related >> operations. Couple of dmesg lines before panic: >> >> Aug 5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface >> bridge0 removed >> Aug 5 19:02:42 romashka kernel: bridge0: Ethernet address: >> 02:af:41:48:c7:00 >> Aug 5 19:02:42 romashka kernel: bridge0: changing name to 'virbr-ab' >> Aug 5 19:02:42 romashka kernel: tap0: Ethernet address: 00:bd:8d:11:f7:00 >> Aug 5 19:02:42 romashka kernel: tap0: link state changed to UP >> Aug 5 19:02:42 romashka kernel: tap0: changing name to 'virbr-ab-nic' >> Aug 5 19:02:42 romashka kernel: virbr-ab-nic: promiscuous mode enabled >> Aug 5 19:02:42 romashka kernel: virbr-ab: link state changed to UP >> Aug 5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface >> tap0 removed >> Aug 5 19:02:43 romashka dnsmasq[1047]: setting --bind-interfaces option >> because of OS limitations >> Aug 5 19:02:43 romashka dnsmasq[1047]: warning: no upstream servers >> configured >> Aug 5 19:02:43 romashka kernel: virbr-ab-nic: link state changed to DOWN >> Aug 5 19:02:43 romashka kernel: virbr-ab: link state changed to DOWN >> Aug 5 19:02:43 romashka kernel: bridge1: Ethernet address: >> 02:af:41:48:c7:01 >> Aug 5 19:02:43 romashka kernel: bridge1: changing name to 'virbr0' >> Aug 5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface >> bridge1 removed >> Aug 5 19:02:43 romashka kernel: tap1: Ethernet address: 00:bd:53:14:f7:01 >> Aug 5 19:02:43 romashka kernel: tap1: link state changed to UP >> Aug 5 19:02:43 romashka kernel: tap1: changing name to 'virbr0-nic' >> Aug 5 19:02:43 romashka kernel: virbr0: link state changed to UP >> Aug 5 19:02:43 romashka kernel: virbr0-nic: promiscuous mode enabled >> Aug 5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface >> tap1 removed >> Aug 5 19:05:03 romashka syslogd: kernel boot file is /boot/kernel/kernel >> Aug 5 19:05:03 romashka kernel: >> Aug 5 19:05:03 romashka syslogd: last message repeated 1 times >> Aug 5 19:05:03 romashka kernel: Fatal trap 12: page fault while in kernel >> mode >> >> If I disable libvirt service, system completes booting fine. What it >> tries to do on start, it creates a couple of bridge(4) and tap(4) >> devices, adds tap devices to bridges it created, and possibly destroy >> these interfaces in case of errors. It also starts dnsmasq on some of >> these interfaces. >> >> This problem started to appear about 2-4 weeks ago. >> >> Roman Bogorodskiy >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a03803e6-5f1e-1960-c6a1-c7477f0ac9d4>