From owner-freebsd-net@freebsd.org Sun May 24 21:22:14 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1936B2DA2E7 for ; Sun, 24 May 2020 21:22:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 49VY8P725Jz3Rr9 for ; Sun, 24 May 2020 21:22:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id F11532DA2E6; Sun, 24 May 2020 21:22:13 +0000 (UTC) Delivered-To: net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id F0D9A2DA617 for ; Sun, 24 May 2020 21:22:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49VY8P6Bg5z3RnW for ; Sun, 24 May 2020 21:22:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D0245277A0 for ; Sun, 24 May 2020 21:22:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 04OLMDU9006774 for ; Sun, 24 May 2020 21:22:13 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 04OLMDap006766 for net@FreeBSD.org; Sun, 24 May 2020 21:22:13 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: net@FreeBSD.org Subject: [Bug 246706] [netgraph] kernel panic due to corrupted memory Date: Sun, 24 May 2020 21:22:10 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.3-STABLE X-Bugzilla-Keywords: panic X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: eugen@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: net@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status keywords bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 May 2020 21:22:14 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D246706 Bug ID: 246706 Summary: [netgraph] kernel panic due to corrupted memory Product: Base System Version: 11.3-STABLE Hardware: Any OS: Any Status: New Keywords: panic Severity: Affects Some People Priority: --- Component: kern Assignee: net@FreeBSD.org Reporter: eugen@freebsd.org CC: ae@FreeBSD.org, avg@FreeBSD.org, glebius@FreeBSD.org, mav@FreeBSD.org, melifaro@FreeBSD.org I run multiple routers using FreeBSD 11.3-STABLE/amd64 355108 and net/mpd5 daemon that dynamically creates/destroys ngXXX interfaces for multiple PPPoE clients. Routers have ECC memory. Since 11.1-RELEASE, the kernel was running it rock stable over 2 years until yesterday one of routers paniced inside NETGRAPH code producing usable crashdump and I have kernel.debug. The server sends its logs to remote syslog collector and latest line sent before panic was "Accepting PPPoE connection" produced by PppoeListenEvent() function of mpd5 code: https://sourceforge.net/p/mpd/svn/2239/tree/trunk/src/pppoe.c#l1356 Then mpd5 continued executing the function PppoeListenEvent() but an attemp= t to create ng_tee(4) node and connect it to ng_pppoe(4) by sending NGM_MKPEER message resulted in kernel panic. Note that stock gdb 6.1.1 shows backtrace incorrectly so I use gdb 9.1: Reading symbols from /data/crash/PPPOE11/kernel.debug... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x40 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80624dc0 stack pointer =3D 0x28:0xfffffe012499f6d0 frame pointer =3D 0x28:0xfffffe012499f700 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 2576 (mpd5) trap number =3D 12 panic: page fault cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff802fda6b =3D db_trace_self_wrapper+0x2b/frame 0xfffffe012499f380 vpanic() at 0xffffffff804f5e2e =3D vpanic+0x17e/frame 0xfffffe012499f3e0 panic() at 0xffffffff804f5ca3 =3D panic+0x43/frame 0xfffffe012499f440 trap_pfault() at 0xffffffff80778540 =3D trap_pfault/frame 0xfffffe012499f490 trap_pfault() at 0xffffffff80778589 =3D trap_pfault+0x49/frame 0xfffffe0124= 99f4f0 trap() at 0xffffffff80777c1d =3D trap+0x29d/frame 0xfffffe012499f600 calltrap() at 0xffffffff80758983 =3D calltrap+0x8/frame 0xfffffe012499f600 --- trap 0xc, rip =3D 0xffffffff80624dc0, rsp =3D 0xfffffe012499f6d0, rbp = =3D 0xfffffe012499f700 --- ng_add_hook() at 0xffffffff80624dc0 =3D ng_add_hook+0x20/frame 0xfffffe0124= 99f700 ng_mkpeer() at 0xffffffff80624a0c =3D ng_mkpeer+0x6c/frame 0xfffffe012499f7= 50 ng_apply_item() at 0xffffffff80622d7f =3D ng_apply_item+0x3ef/frame 0xfffffe012499f7d0 ng_snd_item() at 0xffffffff8062278e =3D ng_snd_item+0x17e/frame 0xfffffe012499f800 ngc_send() at 0xffffffff806329b3 =3D ngc_send+0x1a3/frame 0xfffffe012499f8a0 sosend_generic() at 0xffffffff805868ea =3D sosend_generic+0x4fa/frame 0xfffffe012499f950 kern_sendit() at 0xffffffff8058d246 =3D kern_sendit+0x286/frame 0xfffffe012499fa10 sendit() at 0xffffffff8058d591 =3D sendit+0x191/frame 0xfffffe012499fa70 sys_sendto() at 0xffffffff8058d3ed =3D sys_sendto+0x4d/frame 0xfffffe012499= fac0 amd64_syscall() at 0xffffffff80778f18 =3D amd64_syscall+0x378/frame 0xfffffe012499fbf0 fast_syscall_common() at 0xffffffff80759290 =3D fast_syscall_common+0x101/f= rame 0xfffffe012499fbf0 --- syscall (133, FreeBSD ELF64, sys_sendto), rip =3D 0x80279378a, rsp =3D 0x7fffdfffda08, rbp =3D 0x7fffdfffda50 --- Uptime: 64d17h37m40s Dumping 457 out of 4073 MB:..4%..11%..22%..32%..43%..53%..64%..71%..81%..92% __curthread () at ./machine/pcpu.h:234 234 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 __curthread () at ./machine/pcpu.h:234 #1 doadump (textdump=3D1) at /home/src/sys/kern/kern_shutdown.c:320 #2 0xffffffff804f5a1d in kern_reboot (howto=3D260) at /home/src/sys/kern/kern_shutdown.c:388 #3 0xffffffff804f5e68 in vpanic (fmt=3D, ap=3D0xfffffe01249= 9f420) at /home/src/sys/kern/kern_shutdown.c:784 #4 0xffffffff804f5ca3 in panic (fmt=3D) at /home/src/sys/kern/kern_shutdown.c:715 #5 0xffffffff80778540 in trap_fatal (frame=3D0xfffffe012499f610, eva=3D64) at /home/src/sys/amd64/amd64/trap.c:899 #6 0xffffffff80778589 in trap_pfault (frame=3D0xfffffe012499f610, usermode= =3D0) at /home/src/sys/amd64/amd64/trap.c:744 #7 0xffffffff80777c1d in trap (frame=3D0xfffffe012499f610) at /home/src/sys/amd64/amd64/trap.c:438 #8 #9 0xffffffff80624dc0 in ng_findhook (node=3D0xfffff80092840600, name=3D0xfffff800921e9978 "left2right") at /home/src/sys/netgraph/ng_base.c:1128 #10 ng_add_hook (node=3D0xfffff80092840600, name=3D0xfffff800921e9978 "left= 2right", hookp=3D0xfffffe012499f728) at /home/src/sys/netgraph/ng_base.c:1073 #11 0xffffffff80624a0c in ng_mkpeer (node=3D0xfffff8004f15fe00, name=3D, name2=3D0xfffff800921e9978 "left2right", type=3D) at /home/src/sys/netgraph/ng_base.c:1555 #12 0xffffffff80622d7f in ng_generic_msg (here=3D0xfffff8004f15fe00, item=3D, lasthook=3D) at /home/src/sys/netgraph/ng_base.c:2537 #13 ng_apply_item (node=3D0xfffff8004f15fe00, item=3D0xfffff800423b5c00, rw= =3D1) at /home/src/sys/netgraph/ng_base.c:2437 #14 0xffffffff8062278e in ng_snd_item (item=3D0xfffff800423b5c00, flags=3D0) at /home/src/sys/netgraph/ng_base.c:2320 #15 0xffffffff806329b3 in ngc_send (so=3D, flags=3D, m=3D0xfffff80006d01000, addr=3D, control=3D, td=3D) --Type for more, q to quit, c to continue without paging-- at /home/src/sys/netgraph/ng_socket.c:338 #16 0xffffffff805868ea in sosend_generic (so=3D0xfffff80006c0da38, addr=3D0xfffff8004f6da9f0, uio=3D0xfffffe012499f980, top=3D0xfffff80006d01000, control=3D, flags=3D, td=3D0xfffff8004f560000) at /home/src/sys/kern/uipc_socket.c:1360 #17 0xffffffff8058d246 in kern_sendit (td=3D, s=3D2, mp=3D, flags=3D0, control=3D0x0, segflg=3DUIO_USERSPACE) at /home/src/sys/kern/uipc_syscalls.c:884 #18 0xffffffff8058d591 in sendit (td=3D0xfffff8004f560000, s=3D2, mp=3D0xfffffe012499fa80, flags=3D-1) at /home/src/sys/kern/uipc_syscalls.c:804 #19 0xffffffff8058d3ed in sys_sendto (td=3D0xfffff80092840600, uap=3D) at /home/src/sys/kern/uipc_syscalls.c:935 #20 0xffffffff80778f18 in syscallenter (td=3D0xfffff8004f560000) at /home/src/sys/amd64/amd64/../../kern/subr_syscall.c:132 #21 amd64_syscall (td=3D0xfffff8004f560000, traced=3D0) at /home/src/sys/amd64/amd64/trap.c:1014 #22 #23 0x000000080279378a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffdfffda08 Note that "node" structure seems to be corrupted to the moment of panic: (kgdb) frame 12 #12 0xffffffff80622d7f in ng_generic_msg (here=3D0xfffff8004f15fe00, item=3D, lasthook=3D) at /home/src/sys/netgraph/ng_base.c:2537 2537 error =3D ng_mkpeer(here, mkp->ourhook, mkp->peerho= ok, mkp->type); (kgdb) p *mkp $1 =3D {type =3D "l858", '\000' , ourhook =3D "=D0=AE-$O\000=D0=AC=D0=AA=D0=AA\000\000\000\000\000\000\000\000\000=D0=BAj= \222\000=D0=AC=D0=AA=D0=AA\000=D0=A7\025O\000=D0=AC=D0=AA=D0=AA", peerhook =3D "\200]\a\222\000=D0=AC=D0=AA=D0=AA=D1=8E=D1=91\n\222\000=D0= =AC=D0=AA=D0=AA", '\000' } (kgdb) frame 10 #10 ng_add_hook (node=3D0xfffff80092840600, name=3D0xfffff800921e9978 "left= 2right", hookp=3D0xfffffe012499f728) at /home/src/sys/netgraph/ng_base.c:1073 1073 if (ng_findhook(node, name) !=3D NULL) { (kgdb) p *node $2 =3D {nd_name =3D '\000' , nd_type =3D 0x0, nd_flags = =3D 0, nd_numhooks =3D 0, nd_private =3D 0xfffff80092840600, nd_ID =3D 0, nd_hooks =3D {lh_first = =3D 0x0}, nd_nodes =3D { le_next =3D 0x0, le_prev =3D 0x0}, nd_idnodes =3D {le_next =3D 0x0, le_= prev =3D 0x0}, nd_input_queue =3D { q_flags =3D 0, q_flags2 =3D 0, q_mtx =3D {lock_object =3D {lo_name =3D = 0x0, lo_flags =3D 0, lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D 0}, q_work =3D {stqe_next =3D 0x0= }, queue =3D {stqh_first =3D 0x0, stqh_last =3D 0x0}}, nd_refs =3D 0, nd_vnet =3D 0x0} (kgdb) frame 9 #9 0xffffffff80624dc0 in ng_findhook (node=3D0xfffff80092840600, name=3D0xfffff800921e9978 "left2right") at /home/src/sys/netgraph/ng_base.c:1128 1128 if (node->nd_type->findhook !=3D NULL) (kgdb) p node->nd_type $3 =3D (struct ng_type *) 0x0 Compressed crashdump and kernel.debug files are available here (101MB in total): http://www.grosbein.net/freebsd/crash/20200524/ --=20 You are receiving this mail because: You are the assignee for the bug.=