Date: Wed, 28 Jul 2021 17:50:28 +0200 From: Philipp Ost <pj@smo.de> To: stable@freebsd.org Subject: Reproducible page faults with drm-kmod on 12-Stable/amd64 Message-ID: <9eec8065-4075-c1f6-059d-9a4901b8b050@smo.de>
next in thread | raw e-mail | index | archive | help
Hi stable@! Since switching back to my Radeon HD 5450, I get reproducible page faults and the occasional panic. I am running FreeBSD 12.2-STABLE stable/12-n233459-0f97f2a1857 amd64; I am running a stripped down GENERIC kernel with DEBUG=-g. I have installed these DRM modules: drm-fbsd12.0-kmod-4.16.g20201016_2 drm-kmod-g20190710_1 gpu-firmware-kmod-g20210330 I built these after I updated my machine to the above mentioned revision. Since then, I rebuilt drm-fbsd12.0-kmod with DEBUG=on. The radeonkms module gets loaded via /etc/rc.conf: kld_list="/boot/modules/radeonkms.ko" The graphics card gets identified as follows: vgapci0@pci0:1:0:0: class=0x030000 card=0xe164174b chip=0x68f91002 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Cedar [Radeon HD 5000/6000/7350/8350 Series]' class = display subclass = VGA Most page faults are DRM related: 1. Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 12 fault virtual address = 0xfffff803a38c9180 fault code = supervisor read instruction, protection violation instruction pointer = 0x20:0xfffff803a38c9180 stack pointer = 0x28:0xfffffe00a89036a8 frame pointer = 0x28:0xfffffe00a89036a0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current proces = 951 (Renderer) trap number = 12 panic: page fault cpuid = 2 time = 1627412707 KDB: stack backtrace: #0 0xffffffff8076af45 at kdb_backtrace+0x65 #1 0xffffffff8071f21b at vpanic+0x17b #2 0xffffffff8071f093 at panic+0x43 #3 0xffffffff80a7e9a1 at trap_fatal+0x391 #4 0xffffffff80a7e9ff at trap_pfault+0x4f #5 0xffffffff80a7e046 at trap+0x286 #6 0xffffffff80a56a08 at calltrap+0x8 #7 0xffffffff81cf681c at reservation_object_test_signaled_rcu+0x1dc #8 0xffffffff81bc2350 at radeon_gem_busy_ioctl+0x30 #9 0xffffffff81cad2e1 at drm_ioctl_kernel+0xf1 #10 0xffffffff81cad589 at drm_ioctl+0x289 #11 0xffffffff809788b0 at linux_file_ioctl+0x330 #12 0xffffffff80788e47 at kern_ioctl+0x2b7 #13 0xffffffff80788aea at sys_ioctl+0xfa #14 0xffffffff80a7f557 at amd64_syscall+0x387 #15 0xffffffff80a5732e at fast_syscall_common+0xf8 Uptime: 12m19s Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. 2. This one happend during `make index`: Fatal trap 12: page fault while in kernel-mode cpuid = 3; apic id = 13 fault virtual address = 0x60045dabb18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8163b2a7 stack pointer = 0x28:0xfffffe00a7fb7380 frame pointer = 0x28:0xfffffe00a7fb73b0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 85505 (sh) trap number = 12 panic: page fault cpuid = 3 time = 1627414262 KDB: stack backtrace: #0 0xffffffff8076af45 at kdb_backtrace+0x65 #1 0xffffffff8071f21b at vpanic+0x17b #2 0xffffffff8071f093 at panic+0x43 #3 0xffffffff80a7e9a1 at trap_fatal+0x391 #4 0xffffffff80a7e9ff at trap_pfault+0x4f #5 0xffffffff80a7e046 at trap+0x286 #6 0xffffffff80a56a08 at calltrap+0x8 #7 0xffffffff816f75b2 at zfs_freebsd_write+0xb72 #8 0xffffffff80b2039b at VOP_WRITE_APV+0xeb #9 0xffffffff80801961 at vn_write+0x261 #10 0xffffffff80801433 at vn_io_fault_doio+0x43 #11 0xffffffff807fee0c at vn_io_fault1+0x15c #12 0xffffffff807fce05 at vn_io_fault+0x185 #13 0xffffffff80788750 at dofilewrite+0xb0 #14 0xffffffff807882d0 at sys_write+0xc0 #15 0xffffffff80a7f557 at amd64_syscall+0x387 #16 0xffffffff80a5732e at fast_syscall_common+0xf8 Uptime: 7m48s Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. 3. The lone kernel panic: panic: BUG ON!list_empty(&fence->cb_list) failed at /usr/ports/graphics/drm-fbsd12.0-kmod/work/kms-drm-8843e1fc5/linuxkpi/gplv2/include/linux/dma-fence.h:91 cpuid = 1 time = 1627415383 KDB: stack backtrace: #0 0xffffffff8076af45 at kdb_backtrace+0x65 #1 0xffffffff8071f21b at vpanic+0x17b #2 0xffffffff8071f093 at panic+0x43 #3 0xffffffff81cf5c84 at reservation_object_add_shared_fence+0x274 #4 0xffffffff81d0b289 at ttm_eu_fence_buffer_objects+0x69 #5 0xffffffff81bb2b72 at radeon_cs_parser_fini+0x52 #6 0xffffffff81bb26eb at radeon_cs_ioctl+0x8fb #7 0xffffffff81cad2e1 at drm_ioctl_kernel+0xf1 #8 0xffffffff81cad589 at drm_ioctl+0x289 #9 0xffffffff809788b0 at linux_file_ioctl+0x330 #10 0xffffffff80788e47 at kern_ioctl+0x2b7 #11 0xffffffff80788aea at sys_ioctl+0xfa #12 0xffffffff80a7f557 at amd64_syscall+0x387 #13 0xffffffff80a5732e at fast_syscall_common+0xf8 Uptime: 1m58s Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. 4. The most recent one: Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 14 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff81d0e43f stack pointer = 0x0:0xfffffe00a8908750 frame pointer = 0x0:0xfffffe00a89087e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor flags = interrupt enabled, resume, IOPL = 0 current process = 1120 (Renderer) trap number = 12 panic: page fault cpuid = 4 time = 1627484091 KDB: stack backtrace: #0 0xffffffff8076af45 at kdb_backtrace+0x65 #1 0xffffffff8071f21b at vpanic+0x17b #2 0xffffffff8071f093 at panic+0x43 #3 0xffffffff80a7e9a1 at trap_fatal+0x391 #4 0xffffffff80a7e9ff at trap_pfault+0x4f #5 0xffffffff80a7e046 at trap+0x286 #6 0xffffffff80a56a08 at calltrap+0x8 #7 0xffffffff81bd8dac at radeon_ttm_fault+0x4c #8 0xffffffff8097b685 at linux_cdev_pager_populate+0x125 #9 0xffffffff80a21fee at vm_fault+0x53e #10 0xffffffff80a21990 at vm_fault_trap+0x60 #11 0xffffffff80a7eb4c at trap_pfault+0x19c #12 0xffffffff80a7e1d0 at trap+0x410 #13 0xffffffff80a56a08 at calltrap+0x8 Uptime: 1h32m55s Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. These are all I could capture till now (transcribed by hand, any typos are my fault...). Unfortunately, I was not able to get any sort of crash dump. I have dumpdev=AUTO dumpdir=/var/crash savecore_enable=YES in my /etc/rc.conf, but /var/crash is empty save for a file named minfree. As I said, this is 100% reproducible. The time for something to go haywire ranges from pretty much immediatly to around two hours. Any advice on how to fix this? I'm happy to provide more information if needed. Thanks in advance! Philipp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9eec8065-4075-c1f6-059d-9a4901b8b050>