Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Aug 2018 10:28:40 +0800
From:      Meowthink <meowthink@gmail.com>
To:        "karu.pruun" <karu.pruun@gmail.com>
Cc:        freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Help diagnose my Ryzen build problem (in progress)
Message-ID:  <CABnABoYqyV6Ab3uqESYTyXbeS5G5QuTnMvPoGAc3v-2Whv%2BV=Q@mail.gmail.com>
In-Reply-To: <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw@mail.gmail.com>
References:  <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Update:

machdep.idle = hlt and machdep.idle_mwait = 0 failed also. It can't
last even longer than machdep.idle = mwait, which could normally panic
after a few passes of building gcc. I tried hlt twice, both not longer
than half hour.

Now, as another round of building 4 gccs in parallel is going to finish, with
machdep.idle = spin and machdep.idle_mwait = 0.
Can I say Ryzen 2400G probably have issues with both mwait and hlt?

Regards,
meowthink

Fatal trap 12: page fault while in user mode
cpuid = 6; apic id = 06
fault virtual address   = 0x819cd0000
fault code              = user write data, reserved bits in PTE
instruction pointer     = 0x43:0x80195de26
stack pointer           = 0x3b:0x7fffffffb0b8
frame pointer           = 0x3b:0x7fffffffb100
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 3, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 17888 (ld)
trap number             = 12
panic: page fault
cpuid = 6
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80f7c14f at trap_fatal+0x35f
#4 0xffffffff80f7c1a9 at trap_pfault+0x49
#5 0xffffffff80f7ba10 at trap+0x360
#6 0xffffffff80f5bccc at calltrap+0x8


On Tue, Aug 28, 2018 at 11:47 PM Meowthink <meowthink@gmail.com> wrote:
>
> Hi Peeter,
>
> On 8/28/18, karu.pruun <karu.pruun@gmail.com> wrote:
> > On Mon, Aug 27, 2018 at 6:07 PM Meowthink <meowthink@gmail.com> wrote:
> >
> >> >> Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my
> >> >> Ryzen 5 2400G's model is 11h.
> >> >>
> >> >> On the microcode. It shall be updated through UEFI/BIOS updates. I
> >> >> think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel
> >> >> 0x810100b.
> >> >>
> >> >> Seems like ... the only thing I can do is sit down and wait?
> >> >
> >> > The revision
> >> >
> >> > https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763
> >> >
> >> > works around the mwait issue, i.e. it sets
> >> >
> >> > sysctl machdep.idle_mwait=0
> >> > sysctl machdep.idle=hlt
> >> >
> >>
> >> I think that shall not apply to 2400G, which is model 11h not 1h.
> >> Here're what I have now:
> >>
> >> machdep.idle: acpi
> >> machdep.idle_available: spin, mwait, hlt, acpi
> >> machdep.idle_apl31: 0
> >> machdep.idle_mwait: 1
> >>
> >> > Now it may or may not relate to your problem, but it appears that
> >> > Ryzen 2400G also has another issue with HLT, see the DragonFly bug
> >> > report
> >> >
> >> > https://bugs.dragonflybsd.org/issues/3131
> >> >
> >>
> >> Thanks a lot for that info.
> >> It's much easier to prove your problem, since it's reproducible. But
> >> mine was so random to catch...
> >> Anyway, it seems like the IRET issue [1] is still not fixed? I'm
> >> highly doubt that my issue is this related because my system became
> >> significantly more stable since I stop that irq storm from bluetooth
> >> module - Though it still panics occasionally.
> >> So could anybody tell, what's the difference between FreeBSD
> >> workaround [2] and the DragonflyBSD one?
> >>
> >> > which AMD is aware of and is possibly working on, but it may not have
> >> > appeared in the errata yet. The bug report says that until this is
> >> > fixed, the workaround is to also disable HLT in cpu_idle. I am not
> >> > sure what is the correct value for the sysctl on FreeBSD, perhaps
> >> >
> >> > sysctl machdep.idle=0
> >> >
> >> > or some other value?
> >>
> >> In the meantime, I have this microcode
> >>
> >> # cpucontrol -m 0x8b /dev/cpuctl0
> >> MSR 0x8b: 0x00000000 0x0810100b
> >>
> >> Hence I should use mwait?
> >> Still don't know what should I set. Any idea?
> >
> >
> > If I was you, I'd play around with the sysctls mentioned above and see
> > if it helps. Start with disabling both mwait and hlt, perhaps
> >
> > machdep.idle=spin
> > machdep.idle_mwait=0
> >
> > (assuming that 'spin' means hlt will not used) and then if that does
> > not lead to a panic, try enabling mwait. I can't test 2400G since I
> > don't have it any more. I booted FreeBSD a couple of times but did not
> > run it over long periods of time.
>
> It works!
> After hours and hours of different stressing. I got 8 copies of gcc
> built without any problem.
>
> But it costs lots of power and the fan will become very annoying. As
> so, I don't think I'll test long term stability with this state.
>
> machdep.idle: acpi -> spin
>  - will add ~5W, maybe some deeper C states disabled?
> machdep.idle_mwait: 1 -> 0
>  - will add another ~50W, CPUs are working insomniac.
>
> I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both
> failed with panics when I start building gcc pass by pass.
>
> I'm pretty sure mwait will cause problem, as once I experienced a
> panic immediately after I issued the sysctl command (the 2nd dump info
> followed)
>
> So my next step will be hlt. Still need some time, though.
>
> >
> > Cheers
> >
> > Peeter
> >
> > --
> >
>
> Cheers,
> meowthink
>
> ------------------------------------------------------------------------
> machdep.idle=mwait
>
> panic: ffs_syncvnode: syncing truncated data.
> cpuid = 7
> KDB: stack backtrace:
> #0 0xffffffff80b414b7 at kdb_backtrace+0x67
> #1 0xffffffff80afa9e7 at vpanic+0x177
> #2 0xffffffff80afa863 at panic+0x43
> #3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4
> #4 0xffffffff80dcc915 at ffs_fsync+0x25
> #5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82
> #6 0xffffffff80bc3a62 at sched_sync+0x412
> #7 0xffffffff80abd813 at fork_exit+0x83
> #8 0xffffffff80f5cc7e at fork_trampoline+0xe
>
> ------------------------------------------------------------------------
> machdep.idle_mwait=1
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 7; apic id = 07
> instruction pointer     = 0x20:0xffffffff80e094fe
> stack pointer           = 0x0:0xfffffe081e5df9e0
> frame pointer           = 0x0:0xfffffe081e5dfa50
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 17 (dom0)
> trap number             = 9
> panic: general protection fault
> cpuid = 7
> KDB: stack backtrace:
> #0 0xffffffff80b414b7 at kdb_backtrace+0x67
> #1 0xffffffff80afa9e7 at vpanic+0x177
> #2 0xffffffff80afa863 at panic+0x43
> #3 0xffffffff80f7c14f at trap_fatal+0x35f
> #4 0xffffffff80f7b70e at trap+0x5e
> #5 0xffffffff80f5bccc at calltrap+0x8
> #6 0xffffffff80e07a17 at vm_pageout+0x87
> #7 0xffffffff80abd813 at fork_exit+0x83
> #8 0xffffffff80f5cc7e at fork_trampoline+0xe



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABnABoYqyV6Ab3uqESYTyXbeS5G5QuTnMvPoGAc3v-2Whv%2BV=Q>