From owner-freebsd-stable@FreeBSD.ORG Thu Dec 19 22:36:44 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 04047167; Thu, 19 Dec 2013 22:36:44 +0000 (UTC) Received: from mrout2-b.corp.bf1.yahoo.com (mrout2-b.corp.bf1.yahoo.com [98.139.253.105]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7B105103F; Thu, 19 Dec 2013 22:36:42 +0000 (UTC) Received: from [127.0.0.1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id rBJMZfnZ043725; Thu, 19 Dec 2013 14:35:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1387492543; bh=24fKCC9p2EzxdkXgnlIXyQpAPhVoXgtSUvZYuzdScls=; h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date: Message-ID:Mime-Version:Content-Transfer-Encoding; b=SC3kxx6lvXOfy6aXfHogpCqidxRNGoFzWIuQMY3QPhESsZFlkHVASQ8aE0gWKbKX5 IgDp2tBkvEydUNGTN8CZfzQBa2C0WxwwSzuW+/VOczwY0o3/5Qh43IIz4oHUDnqGfq hvSaBDa5s/a6YZLJkG2QbXDpqMgd9y2ja/1bwrGI= Subject: Re: 10.0 BETA 3 with redports kernel panic From: Sean Bruno To: Peter Wemm In-Reply-To: References: <1384029731.1819.7.camel@powernoodle.corp.yahoo.com> <20131109205030.GF59496@kib.kiev.ua> <1387217065.1407.3.camel@powernoodle.corp.yahoo.com> <1387219558.1407.6.camel@powernoodle.corp.yahoo.com> <20131217120019.GD59496@kib.kiev.ua> <1387285472.2372.2.camel@powernoodle.corp.yahoo.com> <1387473915.2494.0.camel@powernoodle.corp.yahoo.com> <20131219180833.GN59496@kib.kiev.ua> <1387479064.2494.5.camel@powernoodle.corp.yahoo.com> Content-Type: text/plain; charset="us-ascii" Date: Thu, 19 Dec 2013 14:35:41 -0800 Message-ID: <1387492541.27693.5.camel@powernoodle.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 492541001 Cc: Konstantin Belousov , "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Dec 2013 22:36:44 -0000 On Thu, 2013-12-19 at 11:20 -0800, Peter Wemm wrote: > On Thu, Dec 19, 2013 at 10:59 AM, Peter Wemm wrote: > > On Thu, Dec 19, 2013 at 10:51 AM, Sean Bruno wrote: > >> On Thu, 2013-12-19 at 20:08 +0200, Konstantin Belousov wrote: > >>> On Thu, Dec 19, 2013 at 09:25:15AM -0800, Sean Bruno wrote: > >>> > On Tue, 2013-12-17 at 05:04 -0800, Sean Bruno wrote: > >>> > > On Tue, 2013-12-17 at 14:00 +0200, Konstantin Belousov wrote: > >>> > > > On Mon, Dec 16, 2013 at 10:45:58AM -0800, Sean Bruno wrote: > >>> > > > > On Mon, 2013-12-16 at 10:04 -0800, Sean Bruno wrote: > >>> > > > > > > What is the source line for memrw+0x195 ? > >>> > > > > > > >>> > > > > > My apologies for the delay on this. Its been frustrating getting a > >>> > > > > > crashdump on these machines due to their very large tmpfs usage. > >>> > > > > > Currently, I am dumping a crash of 13+GB to a third HD that we had > >>> > > > > > installed for this purpose. > >>> > > > > > > >>> > > > > > The machines are still running RC3 of 10.0r. > >>> > > > > > > >>> > > > > > I will attempt to get the requested information shortly. > >>> > > > > > > >>> > > > > > sean > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > I've updated http://people.freebsd.org/~sbruno/redbuild_panic.txt > >>> > > > > > >>> > > > > It looks like its dying in uiomove() ? > >>> > > > > >>> > > > I believe I already posted the following patch, with no feedback. > >>> > > > > >>> > > > diff --git a/sys/amd64/amd64/mem.c b/sys/amd64/amd64/mem.c > >>> > > > index abbbb21..e371499 100644 > >>> > > > --- a/sys/amd64/amd64/mem.c > >>> > > > +++ b/sys/amd64/amd64/mem.c > >>> > > > @@ -98,7 +98,11 @@ memrw(struct cdev *dev, struct uio *uio, int flags) > >>> > > > kmemphys: > >>> > > > o = v & PAGE_MASK; > >>> > > > c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o)); > >>> > > > - error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio); > >>> > > > + v = PHYS_TO_DMAP(v); > >>> > > > + if (v < DMAP_MIN_ADDRESS || v >= DMAP_MAX_ADDRESS || > >>> > > > + pmap_kextract(v) == 0) > >>> > > > + return (EFAULT); > >>> > > > + error = uiomove((void *)v, (int)c, uio); > >>> > > > continue; > >>> > > > } > >>> > > > else if (dev2unit(dev) == CDEV_MINOR_KMEM) { > >>> > > > >>> > > Will begin testing immediately > >>> > > > >>> > > sean > >>> > > >>> > > >>> > Huh ... both machines panic'd this morning. It'll take 30 minutes or so > >>> > to get a crash dump, but it looks like its still in the same place. > >>> > > >>> > db> whe > >>> > Tracing pid 489 tid 101801 td 0xfffff80322946490 > >>> > kdb_enter() at kdb_enter+0x3e/frame 0xfffffe1839d26220 > >>> > panic() at panic+0x175/frame 0xfffffe1839d262a0 > >>> > vm_fault_hold() at vm_fault_hold+0x14ed/frame 0xfffffe1839d26500 > >>> > vm_fault() at vm_fault+0x77/frame 0xfffffe1839d26540 > >>> > trap_pfault() at trap_pfault+0x19b/frame 0xfffffe1839d265f0 > >>> > trap() at trap+0x5e6/frame 0xfffffe1839d26810 > >>> > calltrap() at calltrap+0x8/frame 0xfffffe1839d26810 > >>> > --- trap 0xc, rip = 0xffffffff80cae47b, rsp = 0xfffffe1839d268d0, rbp = > >>> > 0xfffffe1839d26920 --- > >>> > copyout() at copyout+0x3b/frame 0xfffffe1839d26920 > >>> > memrw() at memrw+0x1b6/frame 0xfffffe1839d26960 > >>> > giant_read() at giant_read+0x7a/frame 0xfffffe1839d269a0 > >>> > devfs_read_f() at devfs_read_f+0xea/frame 0xfffffe1839d26a00 > >>> > dofileread() at dofileread+0x7b/frame 0xfffffe1839d26a40 > >>> > kern_readv() at kern_readv+0x65/frame 0xfffffe1839d26a90 > >>> > sys_read() at sys_read+0x63/frame 0xfffffe1839d26ae0 > >>> > amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe1839d26bf0 > >>> > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1839d26bf0 > >>> > --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b750aa, rsp = > >>> > 0x7fffffffd068, rbp = 0x7fffffffd0b0 --- > >>> > db> call doadump > >>> > > >>> > >>> I need to see exact panic and trap messages, as well as I need to know > >>> the source line for memrw+0x1b6 in the patched kernel. > >> > >> Here is the panic/trap and the requested display. Peter suspects that > >> part of the failure is the use of DMAP_MAX_ADDR and not dmaplimit in > >> this and other comparisons. Patch attached that contains your > >> modifications and his. > >> > >> bcc peter@ > >> > >> > >> panic: vm_fault: fault on nofault entry, addr: fffffe0327240000 > >> cpuid = 16 > >> KDB: stack backtrace: > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > >> 0xfffffe1839d26170 > >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe1839d26220 > >> panic() at panic+0x155/frame 0xfffffe1839d262a0 > >> vm_fault_hold() at vm_fault_hold+0x14ed/frame 0xfffffe1839d26500 > >> vm_fault() at vm_fault+0x77/frame 0xfffffe1839d26540 > >> trap_pfault() at trap_pfault+0x19b/frame 0xfffffe1839d265f0 > >> trap() at trap+0x5e6/frame 0xfffffe1839d26810 > >> calltrap() at calltrap+0x8/frame 0xfffffe1839d26810 > >> --- trap 0xc, rip = 0xffffffff80cae47b, rsp = 0xfffffe1839d268d0, rbp = > >> 0xfffffe1839d26920 --- > >> copyout() at copyout+0x3b/frame 0xfffffe1839d26920 > >> memrw() at memrw+0x1b6/frame 0xfffffe1839d26960 > >> giant_read() at giant_read+0x7a/frame 0xfffffe1839d269a0 > >> devfs_read_f() at devfs_read_f+0xea/frame 0xfffffe1839d26a00 > >> dofileread() at dofileread+0x7b/frame 0xfffffe1839d26a40 > >> kern_readv() at kern_readv+0x65/frame 0xfffffe1839d26a90 > >> sys_read() at sys_read+0x63/frame 0xfffffe1839d26ae0 > >> amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe1839d26bf0 > >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1839d26bf0 > >> --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b750aa, rsp = > >> 0x7fffffffd068, rbp = 0x7fffffffd0b0 --- > >> KDB: enter: panic > >> > >> > >> (kgdb) whe > >> #0 doadump (textdump=-2127435168) at pcpu.h:219 > >> #1 0xffffffff80342e25 in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) > >> at /usr/src/sys/ddb/db_command.c:578 > >> #2 0xffffffff80342b0d in db_command (cmd_table=) at /usr/src/sys/ddb/db_command.c:449 > >> #3 0xffffffff80342884 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 > >> #4 0xffffffff803451f0 in db_trap (type=, code=0) at /usr/src/sys/ddb/db_main.c:231 > >> #5 0xffffffff808fad33 in kdb_trap (type=3, code=0, tf=) at /usr/src/sys/kern/subr_kdb.c:656 > >> #6 0xffffffff80cb0277 in trap (frame=0xfffffe1839d26150) at /usr/src/sys/amd64/amd64/trap.c:579 > >> #7 0xffffffff80c96ef2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 > >> #8 0xffffffff808fa4ee in kdb_enter (why=0xffffffff80f07ff2 "panic", msg=) at cpufunc.h:63 > >> #9 0xffffffff808c1eb5 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:747 > >> #10 0xffffffff80b299ed in vm_fault_hold (map=0xfffff80002000000, vaddr=, fault_type=1 '\001', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:279 > >> #11 0xffffffff80b284b7 in vm_fault (map=0xfffff80002000000, vaddr=, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:224 > >> #12 0xffffffff80cb08cb in trap_pfault (frame=0xfffffe1839d26820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:775 > >> #17 0xffffffff80c9e746 in memrw (dev=, uio=, flags=) at /usr/src/sys/amd64/amd64/mem.c:105 > >> #18 0xffffffff8087323a in giant_read (dev=0xfffff80011302e00, uio=0xfffffe1839d26ab0, ioflag=0) at /usr/src/sys/kern/kern_conf.c:444 > >> #19 0xffffffff807b670a in devfs_read_f (fp=0xfffff80033711a50, uio=0xfffffe1839d26ab0, cred=, flags=0, td=0xfffff80322946490) > >> at /usr/src/sys/fs/devfs/devfs_vnops.c:1193 > >> #20 0xffffffff809117eb in dofileread (td=0xfffff80322946490, fd=4, fp=0xfffff80033711a50, auio=0xfffffe1839d26ab0, offset=, flags=0) at file.h:295 > >> #21 0xffffffff80911525 in kern_readv (td=0xfffff80322946490, fd=4, auio=0xfffffe1839d26ab0) at /usr/src/sys/kern/sys_generic.c:256 > >> #22 0xffffffff809114b3 in sys_read (td=, uap=) at /usr/src/sys/kern/sys_generic.c:171 > >> #23 0xffffffff80cb1017 in amd64_syscall (td=0xfffff80322946490, traced=0) at subr_syscall.c:134 > >> #24 0xffffffff80c971db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391 > >> #25 0x0000000800b750aa in ?? () > >> Previous frame inner to this frame (corrupt stack?) > >> Current language: auto; currently minimal > >> (kgdb) p memrw+0x1b6 > >> $1 = (int (*)(struct cdev *, struct uio *, int)) 0xffffffff80c9e746 > >> (kgdb) f 17 > >> #17 0xffffffff80c9e746 in memrw (dev=, uio=, flags=) at /usr/src/sys/amd64/amd64/mem.c:105 > >> 105 error = uiomove((void *)v, (int)c, uio); > >> (kgdb) list > >> 100 c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o)); > >> 101 v = PHYS_TO_DMAP(v); > >> 102 if (v < DMAP_MIN_ADDRESS || v >= DMAP_MAX_ADDRESS || > >> 103 pmap_kextract(v) == 0) > >> 104 return (EFAULT); > >> 105 error = uiomove((void *)v, (int)c, uio); > >> 106 continue; > >> 107 } > >> 108 else if (dev2unit(dev) == CDEV_MINOR_KMEM) { > >> 109 v = uio->uio_offset; > >> > >> > >> > >> Index: sys/amd64/amd64/mem.c > >> =================================================================== > >> --- sys/amd64/amd64/mem.c (revision 258554) > >> +++ sys/amd64/amd64/mem.c (working copy) > >> @@ -98,7 +98,11 @@ > >> kmemphys: > >> o = v & PAGE_MASK; > >> c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o)); > >> - error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, > >> uio); > >> + v = PHYS_TO_DMAP(v); > >> + if (v < DMAP_MIN_ADDRESS || v >= > >> DMAP_MAX_ADDRESS || > >> + pmap_kextract(v) == 0) > >> + return (EFAULT); > >> + error = uiomove((void *)v, (int)c, uio); > >> continue; > >> } > >> else if (dev2unit(dev) == CDEV_MINOR_KMEM) { > >> Index: sys/amd64/amd64/pmap.c > >> =================================================================== > >> --- sys/amd64/amd64/pmap.c (revision 258554) > >> +++ sys/amd64/amd64/pmap.c (working copy) > >> @@ -1870,7 +1870,7 @@ > >> pd_entry_t pde; > >> vm_paddr_t pa; > >> > >> - if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) { > >> + if (va >= DMAP_MIN_ADDRESS && va < dmaplimit) { > >> pa = DMAP_TO_PHYS(va); > >> } else { > >> pde = *vtopde(va); > >> @@ -3308,7 +3308,7 @@ > >> */ > >> if ((oldpde & PG_A) == 0 || (mpte = vm_page_alloc(NULL, > >> pmap_pde_pindex(va), (va >= DMAP_MIN_ADDRESS && va < > >> - DMAP_MAX_ADDRESS ? VM_ALLOC_INTERRUPT : > >> VM_ALLOC_NORMAL) | > >> + dmaplimit ? VM_ALLOC_INTERRUPT : VM_ALLOC_NORMAL) | > >> VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { > >> SLIST_INIT(&free); > >> pmap_remove_pde(pmap, pde, trunc_2mpage(va), > >> &free, > >> @@ -6117,7 +6117,7 @@ > >> vm_offset_t base, offset; > >> > >> /* If we gave a direct map region in pmap_mapdev, do nothing */ > >> - if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) > >> + if (va >= DMAP_MIN_ADDRESS && va < dmaplimit) > >> return; > >> base = trunc_page(va); > >> offset = va & PAGE_MASK; > >> > >> > > > > Specifically, pmap_kextract(v) is nothing more than a repeat of the > > if() that kib added to mem.c. pmap_kextract() doesn't check to see if > > it is attempting to access beyond the end of the instantiated part of > > the direct map region. pmap_kextract(invalid_address) returns a value > > even between dmaplimit and DMAP_MAX_ADDRESS - and that'll lead to a > > fault. > > The patch is wrong, as you found out. :) > > -- > Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV > Yes, I know, gmail sucks now. If you see this then I forgot. Habits > are hard to break. Yah, ACPI does not like this in the slightest. KDB: debugger backends: ddb KDB: current backend: ddb kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x378 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808a9b31 stack pointer = 0x28:0xffffffff81a90b50 frame pointer = 0x28:0xffffffff81a90bd0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 () [ thread pid 0 tid 0 ] Stopped at __mtx_lock_sleep+0x1b1: movl 0x378(%rax),%ecx db> bt Tracing pid 0 tid 0 td 0xffffffff81527500 __mtx_lock_sleep() at __mtx_lock_sleep+0x1b1/frame 0xffffffff81a90bd0 vmem_xfree() at vmem_xfree+0x42/frame 0xffffffff81a90c10 acpi_find_table() at acpi_find_table+0x274/frame 0xffffffff81a90c60 madt_probe() at madt_probe+0x10/frame 0xffffffff81a90c70 apic_init() at apic_init+0x53/frame 0xffffffff81a90c90 mi_startup() at mi_startup+0x118/frame 0xffffffff81a90cb0 btext() at btext+0x2c sean