Date: Wed, 29 Aug 2018 10:28:40 +0800 From: Meowthink <meowthink@gmail.com> To: "karu.pruun" <karu.pruun@gmail.com> Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Help diagnose my Ryzen build problem (in progress) Message-ID: <CABnABoYqyV6Ab3uqESYTyXbeS5G5QuTnMvPoGAc3v-2Whv%2BV=Q@mail.gmail.com> In-Reply-To: <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw@mail.gmail.com> References: <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Update: machdep.idle = hlt and machdep.idle_mwait = 0 failed also. It can't last even longer than machdep.idle = mwait, which could normally panic after a few passes of building gcc. I tried hlt twice, both not longer than half hour. Now, as another round of building 4 gccs in parallel is going to finish, with machdep.idle = spin and machdep.idle_mwait = 0. Can I say Ryzen 2400G probably have issues with both mwait and hlt? Regards, meowthink Fatal trap 12: page fault while in user mode cpuid = 6; apic id = 06 fault virtual address = 0x819cd0000 fault code = user write data, reserved bits in PTE instruction pointer = 0x43:0x80195de26 stack pointer = 0x3b:0x7fffffffb0b8 frame pointer = 0x3b:0x7fffffffb100 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 3, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17888 (ld) trap number = 12 panic: page fault cpuid = 6 KDB: stack backtrace: #0 0xffffffff80b414b7 at kdb_backtrace+0x67 #1 0xffffffff80afa9e7 at vpanic+0x177 #2 0xffffffff80afa863 at panic+0x43 #3 0xffffffff80f7c14f at trap_fatal+0x35f #4 0xffffffff80f7c1a9 at trap_pfault+0x49 #5 0xffffffff80f7ba10 at trap+0x360 #6 0xffffffff80f5bccc at calltrap+0x8 On Tue, Aug 28, 2018 at 11:47 PM Meowthink <meowthink@gmail.com> wrote: > > Hi Peeter, > > On 8/28/18, karu.pruun <karu.pruun@gmail.com> wrote: > > On Mon, Aug 27, 2018 at 6:07 PM Meowthink <meowthink@gmail.com> wrote: > > > >> >> Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my > >> >> Ryzen 5 2400G's model is 11h. > >> >> > >> >> On the microcode. It shall be updated through UEFI/BIOS updates. I > >> >> think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel > >> >> 0x810100b. > >> >> > >> >> Seems like ... the only thing I can do is sit down and wait? > >> > > >> > The revision > >> > > >> > https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763 > >> > > >> > works around the mwait issue, i.e. it sets > >> > > >> > sysctl machdep.idle_mwait=0 > >> > sysctl machdep.idle=hlt > >> > > >> > >> I think that shall not apply to 2400G, which is model 11h not 1h. > >> Here're what I have now: > >> > >> machdep.idle: acpi > >> machdep.idle_available: spin, mwait, hlt, acpi > >> machdep.idle_apl31: 0 > >> machdep.idle_mwait: 1 > >> > >> > Now it may or may not relate to your problem, but it appears that > >> > Ryzen 2400G also has another issue with HLT, see the DragonFly bug > >> > report > >> > > >> > https://bugs.dragonflybsd.org/issues/3131 > >> > > >> > >> Thanks a lot for that info. > >> It's much easier to prove your problem, since it's reproducible. But > >> mine was so random to catch... > >> Anyway, it seems like the IRET issue [1] is still not fixed? I'm > >> highly doubt that my issue is this related because my system became > >> significantly more stable since I stop that irq storm from bluetooth > >> module - Though it still panics occasionally. > >> So could anybody tell, what's the difference between FreeBSD > >> workaround [2] and the DragonflyBSD one? > >> > >> > which AMD is aware of and is possibly working on, but it may not have > >> > appeared in the errata yet. The bug report says that until this is > >> > fixed, the workaround is to also disable HLT in cpu_idle. I am not > >> > sure what is the correct value for the sysctl on FreeBSD, perhaps > >> > > >> > sysctl machdep.idle=0 > >> > > >> > or some other value? > >> > >> In the meantime, I have this microcode > >> > >> # cpucontrol -m 0x8b /dev/cpuctl0 > >> MSR 0x8b: 0x00000000 0x0810100b > >> > >> Hence I should use mwait? > >> Still don't know what should I set. Any idea? > > > > > > If I was you, I'd play around with the sysctls mentioned above and see > > if it helps. Start with disabling both mwait and hlt, perhaps > > > > machdep.idle=spin > > machdep.idle_mwait=0 > > > > (assuming that 'spin' means hlt will not used) and then if that does > > not lead to a panic, try enabling mwait. I can't test 2400G since I > > don't have it any more. I booted FreeBSD a couple of times but did not > > run it over long periods of time. > > It works! > After hours and hours of different stressing. I got 8 copies of gcc > built without any problem. > > But it costs lots of power and the fan will become very annoying. As > so, I don't think I'll test long term stability with this state. > > machdep.idle: acpi -> spin > - will add ~5W, maybe some deeper C states disabled? > machdep.idle_mwait: 1 -> 0 > - will add another ~50W, CPUs are working insomniac. > > I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both > failed with panics when I start building gcc pass by pass. > > I'm pretty sure mwait will cause problem, as once I experienced a > panic immediately after I issued the sysctl command (the 2nd dump info > followed) > > So my next step will be hlt. Still need some time, though. > > > > > Cheers > > > > Peeter > > > > -- > > > > Cheers, > meowthink > > ------------------------------------------------------------------------ > machdep.idle=mwait > > panic: ffs_syncvnode: syncing truncated data. > cpuid = 7 > KDB: stack backtrace: > #0 0xffffffff80b414b7 at kdb_backtrace+0x67 > #1 0xffffffff80afa9e7 at vpanic+0x177 > #2 0xffffffff80afa863 at panic+0x43 > #3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4 > #4 0xffffffff80dcc915 at ffs_fsync+0x25 > #5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82 > #6 0xffffffff80bc3a62 at sched_sync+0x412 > #7 0xffffffff80abd813 at fork_exit+0x83 > #8 0xffffffff80f5cc7e at fork_trampoline+0xe > > ------------------------------------------------------------------------ > machdep.idle_mwait=1 > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 7; apic id = 07 > instruction pointer = 0x20:0xffffffff80e094fe > stack pointer = 0x0:0xfffffe081e5df9e0 > frame pointer = 0x0:0xfffffe081e5dfa50 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 17 (dom0) > trap number = 9 > panic: general protection fault > cpuid = 7 > KDB: stack backtrace: > #0 0xffffffff80b414b7 at kdb_backtrace+0x67 > #1 0xffffffff80afa9e7 at vpanic+0x177 > #2 0xffffffff80afa863 at panic+0x43 > #3 0xffffffff80f7c14f at trap_fatal+0x35f > #4 0xffffffff80f7b70e at trap+0x5e > #5 0xffffffff80f5bccc at calltrap+0x8 > #6 0xffffffff80e07a17 at vm_pageout+0x87 > #7 0xffffffff80abd813 at fork_exit+0x83 > #8 0xffffffff80f5cc7e at fork_trampoline+0xe
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABnABoYqyV6Ab3uqESYTyXbeS5G5QuTnMvPoGAc3v-2Whv%2BV=Q>