From owner-freebsd-current@freebsd.org Mon Mar 26 13:35:39 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7D9EF6D7B5 for ; Mon, 26 Mar 2018 13:35:38 +0000 (UTC) (envelope-from marklmi26-fbsd@yahoo.com) Received: from sonic303-22.consmr.mail.gq1.yahoo.com (sonic303-22.consmr.mail.gq1.yahoo.com [98.137.64.203]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7021A7DFFC for ; Mon, 26 Mar 2018 13:35:38 +0000 (UTC) (envelope-from marklmi26-fbsd@yahoo.com) X-YMail-OSG: 69jbsYAVM1nRbk8wziH146mD6X1kplk_0QegOAYxB.9bFrGyVpb6gb4MW4ZZd78 DUOhDtmW6lg4i9G7csKvAmUHwT_IKkEwBgROaQNn5.TTZVgGKrSgbGpiA3E1oeK06NYil6cuE.zS doOx61qanFLJ_RHMYQGI9pdNlDZ2QxUvPP_ydFN7yko6F4VNGDeptRXlBqKbPvwF8CIg337frGch 0YBMyHxfjM8CZ60cbB9LWN4puP2H84MstGRcdrci16.1D31GefNHxtJ_m5oKpFYVObwXxXwt.z7O mQu2dZEimDyRbFxdWaD6Cux9HiVNc6wjOTyA6LV5wgL.gdzEorOkDQl7u8.M0TsZ7NHJkXiclxDY .rkLO2umpOJsUcd11fyFu_HCmdhB6gUDxzbU.8AGB7J6fsRHJyq2bYOKLECshuOgzrrj6J.EfwYc XLwEaOActVcm8Hv5FV7yq0I6NsJD6n_OsRWsSdR1t8Rgaa_9ndgufxLgf3kGtHsd2H2e.VlZB_ed iar061Z61N_NPjMTedgbrj0x7rK6dDKzB3Hyk Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Mon, 26 Mar 2018 13:35:31 +0000 Received: from c-76-115-7-162.hsd1.or.comcast.net (EHLO [192.168.1.25]) ([76.115.7.162]) by smtp426.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 7ba435dd53eb2d74c7e5eb9129344802; Mon, 26 Mar 2018 13:35:30 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: head -r331499 amd64/threadripper panic in vm_page_free_prep during "poudriere bulk -a", after 14h 22m or so. From: Mark Millard In-Reply-To: <45B4FCDA-C743-4F35-B819-9CB064C20038@yahoo.com> Date: Mon, 26 Mar 2018 06:35:29 -0700 Cc: FreeBSD Current Content-Transfer-Encoding: 7bit Message-Id: <08B7C130-A38D-473A-8A73-CA79ED1A0044@yahoo.com> References: <8D9C49CB-957E-40A5-8EB0-D90D8AC02060@yahoo.com> <20180325183421.GA74365@raichu> <44821CA4-19C2-4265-8E83-568452DF6471@yahoo.com> <20180325200934.GC74365@raichu> <45B4FCDA-C743-4F35-B819-9CB064C20038@yahoo.com> To: Mark Johnston X-Mailer: Apple Mail (2.3445.5.20) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Mar 2018 13:35:39 -0000 [Unfortunately, I'd not be able to get back to this for many hours. I do not want to leave the machine at the db> prompt that long. So this is all there will be.] It got a different crash last night, after a little over 12 hours of poudriere bulk -a activity, again while I was sleeping. Hand typed: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 13; apic id = 0d fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80b70867 stack pointer = 0x28:0xfffffe00ebab8880 frame pointer = 0x28:0xfffffe00ebab8890 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 44 (dom0) [ thread pid 44 tid 100277 ] Stopped at turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx (So an offset from a null pointer, apparently.) bt shows: Tracing pid 44 tid 100277 td 0xfffff8010f938560 turnstile_broadcast() at turnstile_broadcast+0x47/frame 0xfffffe00ebab8890 __mtx_unlock_sleep() at __mtx_unlock_sleep+0xb9/frame 0xfffffe00ebab88c0 vm_pageout_page_lock() at vm_pageout_page_lock+0x179/frame 0xfffffe00ebab8960 vm_pageout_worker() at vm_pageout_worker+0xd3a/frame 0xfffffe00ebab8a50 vm_pageout() at vm_pageout+0x133/frame 0xfffffe00ebab8a70 fork_exit() at fork_exit+0x83/frame 0xfffffe00ebab8ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ebab8ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Dump again failed, the same way but with some byte value differences. (da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 39 8c c7 00 00 08 00 (da1:storvsc1:0:0:0) CAM status Command timeout (da1:storvsc1:0:0:0) Error 5, Retries exhausted Aborting dump to to I/O error. ** DUMP FAILED (ERROR 5) ** Cannot dump: unknown error (error=5) So this appears to be repeatable (for the Optane swap/page partition?). show reg: cs 0x20 ds 0x3b ll+0x1a es 0x3b ll+0x1a fs 0x13 gs 0x1b ss 0x28 ll+0x7 rax 0 rcx 0xfffff8010f938501 rdx 0xfffff8010f938501 rbx 0xfffffe00ebab8880 rsp 0xfffffe00ebab8800 rsi 0 rdi 0 r8 0 r9 0 r10 0 r11 0 r12 0 r13 0xfffff8010f938560 r14 0 r15 0xffffffff81d67998 vm_dom+0x18 rip 0xffffffff80b70867 turnstile_broadcast+0x47 rflags 0x10056 turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx Around where rbx points: 0xfffffe00ebab8872: ab eb 0 fe ff ff 28 0 0 0 0 0 0 0 0xfffffe00ebab8880: 0 0 0 0 0 0 0 0 80 79 d6 81 ff ff 0xfffffe00ebab888e: ff ff c0 88 ab eb 0 fe ff ff 9 20 af 80 0xfffffe00ebab889c: ff ff ff ff 0 7b 2 d8 f f8 ff ff 98 79 And it looks like we have that null pointer above. And I'm afraid that is it: I need to be off doing other things. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)