From owner-freebsd-x11@freebsd.org Wed Nov 13 14:52:14 2019 Return-Path: Delivered-To: freebsd-x11@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 477521B2A8D; Wed, 13 Nov 2019 14:52:14 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 47CndS6WG3z47B9; Wed, 13 Nov 2019 14:52:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id xADEq4ZS003681 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 13 Nov 2019 06:52:04 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id xADEq4Hj003680; Wed, 13 Nov 2019 06:52:04 -0800 (PST) (envelope-from sgk) Date: Wed, 13 Nov 2019 06:52:04 -0800 From: Steve Kargl To: Hans Petter Selasky Cc: Mark Johnston , freebsd-current@freebsd.org, freebsd-x11@freebsd.org Subject: Re: unkillable process consuming 100% cpu Message-ID: <20191113145204.GA3650@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20191107202919.GA4565@troutmask.apl.washington.edu> <20191107203223.GF16978@raichu> <20191108220935.GA856@troutmask.apl.washington.edu> <6a4e5993-623a-ebaa-8180-e11c7d48e706@selasky.org> <20191112173136.GA69344@troutmask.apl.washington.edu> <0961307a-8b14-8240-0466-6bb4edf52788@selasky.org> <20191113003001.GA94074@troutmask.apl.washington.edu> <1de2773b-3f54-472b-65a1-0f1c297aab60@selasky.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1de2773b-3f54-472b-65a1-0f1c297aab60@selasky.org> User-Agent: Mutt/1.12.2 (2019-09-21) X-Rspamd-Queue-Id: 47CndS6WG3z47B9 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of sgk@troutmask.apl.washington.edu has no SPF policy when checking 128.95.76.21) smtp.mailfrom=sgk@troutmask.apl.washington.edu X-Spamd-Result: default: False [-1.31 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; NEURAL_HAM_MEDIUM(-0.99)[-0.986,0]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[washington.edu]; REPLYTO_ADDR_EQ_FROM(0.00)[]; AUTH_NA(1.00)[]; NEURAL_HAM_LONG(-1.00)[-0.998,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-0.22)[ip: (0.06), ipnet: 128.95.0.0/16(-0.29), asn: 73(-0.83), country: US(-0.05)]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-x11@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: X11 on FreeBSD -- maintaining and support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Nov 2019 14:52:14 -0000 On Wed, Nov 13, 2019 at 09:10:06AM +0100, Hans Petter Selasky wrote: > On 2019-11-13 01:30, Steve Kargl wrote: > > > > I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023, > > rebooting, and have been pounding on the system with workloads that are > > similar to what the system was doing during the lockups. So far, I > > cannot ge the system lock-up. Looks like your patch fixes (or at > > least helps). Thanks for taking a look at the problem. > > > > Can you apply the kdb.diff on top and check dmesg for prints? > I could not find the amdgpu_amdkfd_gpuvm.c file when I went looking. Is it autogenerated? I also spoke too soon. I got a panic after my reply above. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 15 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe00b460e188 frame pointer = 0x28:0xfffffe00b460e1c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 877 (X:rcs0) trap number = 12 panic: page fault cpuid = 5 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b460dde0 vpanic() at vpanic+0x17e/frame 0xfffffe00b460de40 panic() at panic+0x43/frame 0xfffffe00b460dea0 trap_fatal() at trap_fatal+0x388/frame 0xfffffe00b460df10 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b460df80 trap() at trap+0x288/frame 0xfffffe00b460e0b0 calltrap() at calltrap+0x8/frame 0xfffffe00b460e0b0 --- trap 0xc, rip = 0, rsp = 0xfffffe00b460e188, rbp = 0xfffffe00b460e1c0 --- ??() at 0/frame 0xfffffe00b460e1c0 radeon_cs_ioctl() at radeon_cs_ioctl+0xa0b/frame 0xfffffe00b460e640 drm_ioctl_kernel() at drm_ioctl_kernel+0xf1/frame 0xfffffe00b460e680 drm_ioctl() at drm_ioctl+0x279/frame 0xfffffe00b460e770 linux_file_ioctl() at linux_file_ioctl+0x298/frame 0xfffffe00b460e7d0 kern_ioctl() at kern_ioctl+0x284/frame 0xfffffe00b460e840 sys_ioctl() at sys_ioctl+0x157/frame 0xfffffe00b460e910 amd64_syscall() at amd64_syscall+0x273/frame 0xfffffe00b460ea30 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b460ea30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x200cc6bfa, rsp = 0x7fffbfffde98, rbp = 0x7fffbfffdec0 --- Uptime: 5h9m5s Dumping 1472 out of 16327 MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 warning: Source file is more recent than executable. 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0xffffffff805de452 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0xffffffff805de8a6 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:908 #4 0xffffffff805de6c3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:835 #5 0xffffffff808b0d58 in trap_fatal (frame=0xfffffe00b460e0c0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:925 #6 0xffffffff808b0daf in trap_pfault (frame=0xfffffe00b460e0c0, usermode=, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:743 #7 0xffffffff808b0468 in trap (frame=0xfffffe00b460e0c0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 #9 0x0000000000000000 in ?? () #10 0xffffffff817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xfffff80061eeb248) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720 #11 radeon_ttm_tt_set_userptr (ttm=0xfffff80061eeb248, addr=1, flags=2147483647) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804 #12 0xffffffff817adc9b in radeon_is_px (dev=0xfffff8017fe84e00) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156 #13 0xffffffff818a9e81 in drm_ioctl_kernel (linux_file=, func=0xfffffe00b460e428, kdata=0xfffffe00b31eb000, flags=1521620552) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:760 #14 0xffffffff818aa129 in drm_ioctl (filp=0xfffff80061198e00, cmd=, arg=65536) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:856 #15 0xffffffff807c8098 in linux_file_ioctl_sub (fp=, filp=, fop=, cmd=, data=, td=) at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:965 #16 linux_file_ioctl (fp=, cmd=, data=, cred=, td=0xfffff800612c0000) at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1558 #17 0xffffffff8063ed34 in fo_ioctl (fp=, com=3223348326, data=0x7fffffff, active_cred=0xfffffe001f7e6250, td=0xfffff800612c0000) at /usr/src/sys/sys/file.h:340 #18 kern_ioctl (td=, fd=9, com=3223348326, data=0x7fffffff ) at /usr/src/sys/kern/sys_generic.c:801 #19 0xffffffff8063ea37 in sys_ioctl (td=0xfffff800612c0000, uap=0xfffff800612c03c8) at /usr/src/sys/kern/sys_generic.c:709 #20 0xffffffff808b1783 in syscallenter (td=0xfffff800612c0000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #21 amd64_syscall (td=0xfffff800612c0000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1162 #22 -- Steve