From owner-freebsd-amd64@freebsd.org Mon Nov 20 09:15:38 2017 Return-Path: Delivered-To: freebsd-amd64@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C73C6DE5E61 for ; Mon, 20 Nov 2017 09:15:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-143.reflexion.net [208.70.210.143]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 879C376715 for ; Mon, 20 Nov 2017 09:15:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 31744 invoked from network); 20 Nov 2017 09:15:30 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 20 Nov 2017 09:15:30 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Mon, 20 Nov 2017 04:15:30 -0500 (EST) Received: (qmail 17707 invoked from network); 20 Nov 2017 09:15:30 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 20 Nov 2017 09:15:30 -0000 Received: from [192.168.1.25] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 04B11EC8F85; Mon, 20 Nov 2017 01:15:29 -0800 (PST) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: head -r325997: Fatal trap 12: page fault while in kernel mode (during a buildworld, virtualbox guest context) [2nd example] Date: Mon, 20 Nov 2017 01:15:29 -0800 References: <2A312919-EF66-4FC3-85E4-A796315DB978@dsl-only.net> <3C5C0D1B-4990-426A-B622-6EC4CC6A1F3F@dsl-only.net> To: FreeBSD Current , freebsd-amd64@freebsd.org, freebsd-hackers In-Reply-To: <3C5C0D1B-4990-426A-B622-6EC4CC6A1F3F@dsl-only.net> Message-Id: <2E7497BD-06C6-4C86-AA83-1150C735315B@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Nov 2017 09:15:38 -0000 [Adding some analysis of where the 2 failures were in source code terms.] On 2017-Nov-19, at 9:07 PM, Mark Millard wrote: > [I got another of these. By the way: amd64 context. > Again: buildworld was running.] >=20 > On 2017-Nov-19, at 5:52 PM, Mark Millard = wrote: >=20 >> Attempting a dump failed. I'm afraid all for >> information is the below. The kernel was a >> non-debug kernel (with debug information). >>=20 >> The following is hand typed from a screen shot: >>=20 >> Fatal trap 12: page fault while in kernel mode >> cpuid =3D 0; apic id =3D 00 >> fault virtual address =3D 0xffffff53f000e2b0 >=20 > New one: 0x806b49010 >=20 >> fault code =3D supervisor read data, page not present >=20 > New one: supervisor write data, page not present >=20 >> instruction pointer =3D 0x20:0xffffffff80f2b11e >=20 > New one: 0x20:0xffffffff80f2b21b >=20 >> stack pointer =3D 0x0:0xfffffe01aeb28970 >=20 > New one: 0x28:0xfffffe01aeb28970 >=20 >> frame pointer =3D 0x0:0xfffffe01aeb289f0 >=20 > New one: 0x28:0xfffffe01aeb289f0 >=20 >> code segment =3D base 0x0, limit 0xfffff, type 0x1b >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >> current process =3D 20 (pagedaemon) >> [ thread pid 20 tid 100089 ] >> Stopped at pmap_ts_referenced+0x72e: movq (%rcx,rdi,8),%rbx >=20 > New one: pmap_ts_referenced+0x82b: movq %rcx,0x10(%rax) >=20 >> bd > bt >> Tracing pid 20 tid 100089 td 0xfffff80003eb3560 >=20 > New one: td 0xfffff80003df6000 >=20 >> pmap_ts_referenced() at pmap_ts_referenced_0x72e/frame = 0xfffffe01aeb289f0 > New one: > pmap_ts_referenced() at pmap_ts_referenced_0x82b/frame = 0xfffffe01aeb289f0 >=20 >> vm_pageout() at vm_pageout+0xdeb/frame 0xfffffe01aeb28ab0 >=20 > Correction to original: frame 0xfffffe01aeb28a70 > (new is the same) >=20 >> fork_exit() at fork_exit+0x82/frame 0xfffffe01aeb28ab0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01aeb28ab0 >> --- trap 0, rip =3D 0, rsp =3D 0, rpb =3D 0 --- >> db> >>=20 >> The prior (cross) buildworld buildkernel had completed fine. >>=20 >> Until yesterday, I'd been running -r325700 or before and had not >> seen such an issue ever before. I'd been using the virtualbox >> version for a while before this as well. Taking the case of: Stopped at pmap_ts_referenced+0x72e: movq (%rcx,rdi,8),%rbx: ffffffff80f2b0fc mov %rax,%rsi ffffffff80f2b0ff shr $0x1b,%rsi ffffffff80f2b103 and $0xff8,%esi ffffffff80f2b109 mov (%rcx,%rsi,1),%rcx ffffffff80f2b10d and %r10,%rcx ffffffff80f2b110 or %r9,%rcx ffffffff80f2b113 mov %eax,%edi ffffffff80f2b115 shr $0x15,%edi ffffffff80f2b118 and $0x1ff,%edi ffffffff80f2b11e mov (%rcx,%rdi,8),%rbx = <<<<<<=3D=3D=3D=3D=3D=3D=3D ffffffff80f2b122 and %r10,%rbx ffffffff80f2b125 or %r9,%rbx ffffffff80f2b128 shr $0x9,%rax ffffffff80f2b12c and $0xff8,%eax ffffffff80f2b131 lea (%rbx,%rax,1),%rsi ffffffff80f2b135 mov (%rbx,%rax,1),%rbx ffffffff80f2b139 mov %rbx,%rax ffffffff80f2b13c and %rdx,%rax ffffffff80f2b13f cmp %rdx,%rax ffffffff80f2b142 jne ffffffff80f2b14f = Which, if I understand right, is in the "small_mappings:" code: PG_A =3D pmap_accessed_bit(pmap); PG_M =3D pmap_modified_bit(pmap); PG_RW =3D pmap_rw_bit(pmap); pde =3D pmap_pde(pmap, pv->pv_va); KASSERT((*pde & PG_PS) =3D=3D 0, ("pmap_ts_referenced: found a 2mpage in page %p's pv = list", m)); pte =3D pmap_pde_to_pte(pde, pv->pv_va); if ((*pte & (PG_M | PG_RW)) =3D=3D (PG_M | PG_RW)) vm_page_dirty(m); if ((*pte & PG_A) !=3D 0) { with the failure being during *pde in: /* Return a pointer to the PT slot that corresponds to a VA */ static __inline pt_entry_t * pmap_pde_to_pte(pd_entry_t *pde, vm_offset_t va) { pt_entry_t *pte; pte =3D (pt_entry_t *)PHYS_TO_DMAP(*pde & PG_FRAME); return (&pte[pmap_pte_index(va)]); } Taking the case of: New one: pmap_ts_referenced+0x82b: movq %rcx,0x10(%rax) ffffffff80f2b1fb lock cmpxchg %rcx,(%rdx) ffffffff80f2b200 sete %cl ffffffff80f2b203 test %cl,%cl ffffffff80f2b205 je ffffffff80f2b27d = ffffffff80f2b207 test %r12,%r12 ffffffff80f2b20a je ffffffff80f2b255 = ffffffff80f2b20c mov 0x8(%r12),%rax ffffffff80f2b211 test %rax,%rax ffffffff80f2b214 je ffffffff80f2b255 = ffffffff80f2b216 mov 0x10(%r12),%rcx ffffffff80f2b21b mov %rcx,0x10(%rax) = <<<<<<<<<=3D=3D=3D=3D=3D=3D=3D=3D=3D ffffffff80f2b21f mov 0x8(%r12),%rax ffffffff80f2b224 mov 0x10(%r12),%rcx ffffffff80f2b229 mov %rax,(%rcx) Which, if I understand right, appears to be during the TAILQ_REMOVE of: PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (pv !=3D NULL && TAILQ_NEXT(pv, pv_next) !=3D NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); . . . #define TAILQ_REMOVE(head, elm, field) do { = \ QMD_SAVELINK(oldnext, (elm)->field.tqe_next); = \ QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); = \ QMD_TAILQ_CHECK_NEXT(elm, field); = \ QMD_TAILQ_CHECK_PREV(elm, field); = \ if ((TAILQ_NEXT((elm), field)) !=3D NULL) = \ TAILQ_NEXT((elm), field)->field.tqe_prev =3D = \ (elm)->field.tqe_prev; = \ else { = \ (head)->tqh_last =3D (elm)->field.tqe_prev; = \ QMD_TRACE_HEAD(head); = \ } = \ *(elm)->field.tqe_prev =3D TAILQ_NEXT((elm), field); = \ TRASHIT(*oldnext); = \ TRASHIT(*oldprev); = \ QMD_TRACE_ELEM(&(elm)->field); = \ } while (0) where the kernel was a non-debug kernel (with debug symbols). =3D=3D=3D Mark Millard markmi at dsl-only.net