From owner-freebsd-hackers@freebsd.org  Wed Aug 29 02:28:53 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2445E109E65D;
 Wed, 29 Aug 2018 02:28:53 +0000 (UTC)
 (envelope-from meowthink@gmail.com)
Received: from mail-oi0-x241.google.com (mail-oi0-x241.google.com
 [IPv6:2607:f8b0:4003:c06::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 95C9D778DC;
 Wed, 29 Aug 2018 02:28:52 +0000 (UTC)
 (envelope-from meowthink@gmail.com)
Received: by mail-oi0-x241.google.com with SMTP id l202-v6so6498663oig.7;
 Tue, 28 Aug 2018 19:28:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=lWmdyp8DhnDp22oaEuQ0u67p+c1T3HQRt6arR6PMxlg=;
 b=eQsWrGMA94Gww9M8JeJTNt4gAJ0r/Hyg154PTe+x9zVwytS94HMshb7wpmwspyQttK
 CWYlQkXMGsf+XLN9goOYfm8isbwpOJ+z/VTMINov6XFz1lKM5TluriocNB64IsQYgf3t
 uMiE3k25Fine9+Q4YIkc+Y5n5jjt92fqJ//LznBEWhy4q8eQP0TMb/rCoXpyLKYEd8o5
 TOyEPR6rUqg+wkskIW42mkLG4eLl/plBRnqHe13B+7mBZJg2tle+/skkEFtoMknzlLtx
 FFaNUpYrQnx5HHqkp7GLqyj5kXkqbmu2Av2cHAXz0UutKUooq7a4hcIltDLDkrLCgWjP
 12jg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=lWmdyp8DhnDp22oaEuQ0u67p+c1T3HQRt6arR6PMxlg=;
 b=gTAWuYivSB3lHw6TTqwchh0YMfqO4Ec7gzgNU+n9jD299M2do2+07t8pQz8vUGEcSn
 RqyKgvX9QR7hciUWW5lBhZrdjFt4AaQwsJsAxfH1Bo0oQEuBBzSoBCW0tl3sRMt0zZ2b
 f/923Quo8Ex4QhwToCNjY43CchCQpaj4INPuqMx9dl31i6n96GuZw812Rp/xezyJoGTR
 XBMieuhtwX4OTkJcw3grql1FT4d2d/32n1ZpenUKKW4fkkrQJgRn+vAjjs388ILXXuzg
 1vCgm052nJESoo+VDsR2hkMA+nOVW/MqJ3CTC5DoXw1TUKyJ3qNMc0ePv+XlzIAXXTV1
 RaIg==
X-Gm-Message-State: APzg51CvO2SV/frbn+fS0D3UXSBS8mRfQzod7xp3ImiYWa2cq4sTODml
 oLytol6Etjn2P576hvyS1RGg3BSUMa9O0giPtAQ=
X-Google-Smtp-Source: ANB0VdZJK3iZOJd1qAcWuqS2ciqb88xgGuEIDGUYs2h2okSQcu3UIIEGHBGhDnZVtSRg/DGnLgxHysj/Nlo7gSuXepU=
X-Received: by 2002:aca:4802:: with SMTP id v2-v6mr805832oia.259.1535509731816; 
 Tue, 28 Aug 2018 19:28:51 -0700 (PDT)
MIME-Version: 1.0
References: <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV+Eaw@mail.gmail.com>
In-Reply-To: <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV+Eaw@mail.gmail.com>
From: Meowthink <meowthink@gmail.com>
Date: Wed, 29 Aug 2018 10:28:40 +0800
Message-ID: <CABnABoYqyV6Ab3uqESYTyXbeS5G5QuTnMvPoGAc3v-2Whv+V=Q@mail.gmail.com>
Subject: Re: Help diagnose my Ryzen build problem (in progress)
To: "karu.pruun" <karu.pruun@gmail.com>
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Aug 2018 02:28:53 -0000

Update:

machdep.idle = hlt and machdep.idle_mwait = 0 failed also. It can't
last even longer than machdep.idle = mwait, which could normally panic
after a few passes of building gcc. I tried hlt twice, both not longer
than half hour.

Now, as another round of building 4 gccs in parallel is going to finish, with
machdep.idle = spin and machdep.idle_mwait = 0.
Can I say Ryzen 2400G probably have issues with both mwait and hlt?

Regards,
meowthink

Fatal trap 12: page fault while in user mode
cpuid = 6; apic id = 06
fault virtual address   = 0x819cd0000
fault code              = user write data, reserved bits in PTE
instruction pointer     = 0x43:0x80195de26
stack pointer           = 0x3b:0x7fffffffb0b8
frame pointer           = 0x3b:0x7fffffffb100
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 3, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 17888 (ld)
trap number             = 12
panic: page fault
cpuid = 6
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80f7c14f at trap_fatal+0x35f
#4 0xffffffff80f7c1a9 at trap_pfault+0x49
#5 0xffffffff80f7ba10 at trap+0x360
#6 0xffffffff80f5bccc at calltrap+0x8


On Tue, Aug 28, 2018 at 11:47 PM Meowthink <meowthink@gmail.com> wrote:
>
> Hi Peeter,
>
> On 8/28/18, karu.pruun <karu.pruun@gmail.com> wrote:
> > On Mon, Aug 27, 2018 at 6:07 PM Meowthink <meowthink@gmail.com> wrote:
> >
> >> >> Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my
> >> >> Ryzen 5 2400G's model is 11h.
> >> >>
> >> >> On the microcode. It shall be updated through UEFI/BIOS updates. I
> >> >> think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel
> >> >> 0x810100b.
> >> >>
> >> >> Seems like ... the only thing I can do is sit down and wait?
> >> >
> >> > The revision
> >> >
> >> > https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763
> >> >
> >> > works around the mwait issue, i.e. it sets
> >> >
> >> > sysctl machdep.idle_mwait=0
> >> > sysctl machdep.idle=hlt
> >> >
> >>
> >> I think that shall not apply to 2400G, which is model 11h not 1h.
> >> Here're what I have now:
> >>
> >> machdep.idle: acpi
> >> machdep.idle_available: spin, mwait, hlt, acpi
> >> machdep.idle_apl31: 0
> >> machdep.idle_mwait: 1
> >>
> >> > Now it may or may not relate to your problem, but it appears that
> >> > Ryzen 2400G also has another issue with HLT, see the DragonFly bug
> >> > report
> >> >
> >> > https://bugs.dragonflybsd.org/issues/3131
> >> >
> >>
> >> Thanks a lot for that info.
> >> It's much easier to prove your problem, since it's reproducible. But
> >> mine was so random to catch...
> >> Anyway, it seems like the IRET issue [1] is still not fixed? I'm
> >> highly doubt that my issue is this related because my system became
> >> significantly more stable since I stop that irq storm from bluetooth
> >> module - Though it still panics occasionally.
> >> So could anybody tell, what's the difference between FreeBSD
> >> workaround [2] and the DragonflyBSD one?
> >>
> >> > which AMD is aware of and is possibly working on, but it may not have
> >> > appeared in the errata yet. The bug report says that until this is
> >> > fixed, the workaround is to also disable HLT in cpu_idle. I am not
> >> > sure what is the correct value for the sysctl on FreeBSD, perhaps
> >> >
> >> > sysctl machdep.idle=0
> >> >
> >> > or some other value?
> >>
> >> In the meantime, I have this microcode
> >>
> >> # cpucontrol -m 0x8b /dev/cpuctl0
> >> MSR 0x8b: 0x00000000 0x0810100b
> >>
> >> Hence I should use mwait?
> >> Still don't know what should I set. Any idea?
> >
> >
> > If I was you, I'd play around with the sysctls mentioned above and see
> > if it helps. Start with disabling both mwait and hlt, perhaps
> >
> > machdep.idle=spin
> > machdep.idle_mwait=0
> >
> > (assuming that 'spin' means hlt will not used) and then if that does
> > not lead to a panic, try enabling mwait. I can't test 2400G since I
> > don't have it any more. I booted FreeBSD a couple of times but did not
> > run it over long periods of time.
>
> It works!
> After hours and hours of different stressing. I got 8 copies of gcc
> built without any problem.
>
> But it costs lots of power and the fan will become very annoying. As
> so, I don't think I'll test long term stability with this state.
>
> machdep.idle: acpi -> spin
>  - will add ~5W, maybe some deeper C states disabled?
> machdep.idle_mwait: 1 -> 0
>  - will add another ~50W, CPUs are working insomniac.
>
> I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both
> failed with panics when I start building gcc pass by pass.
>
> I'm pretty sure mwait will cause problem, as once I experienced a
> panic immediately after I issued the sysctl command (the 2nd dump info
> followed)
>
> So my next step will be hlt. Still need some time, though.
>
> >
> > Cheers
> >
> > Peeter
> >
> > --
> >
>
> Cheers,
> meowthink
>
> ------------------------------------------------------------------------
> machdep.idle=mwait
>
> panic: ffs_syncvnode: syncing truncated data.
> cpuid = 7
> KDB: stack backtrace:
> #0 0xffffffff80b414b7 at kdb_backtrace+0x67
> #1 0xffffffff80afa9e7 at vpanic+0x177
> #2 0xffffffff80afa863 at panic+0x43
> #3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4
> #4 0xffffffff80dcc915 at ffs_fsync+0x25
> #5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82
> #6 0xffffffff80bc3a62 at sched_sync+0x412
> #7 0xffffffff80abd813 at fork_exit+0x83
> #8 0xffffffff80f5cc7e at fork_trampoline+0xe
>
> ------------------------------------------------------------------------
> machdep.idle_mwait=1
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 7; apic id = 07
> instruction pointer     = 0x20:0xffffffff80e094fe
> stack pointer           = 0x0:0xfffffe081e5df9e0
> frame pointer           = 0x0:0xfffffe081e5dfa50
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 17 (dom0)
> trap number             = 9
> panic: general protection fault
> cpuid = 7
> KDB: stack backtrace:
> #0 0xffffffff80b414b7 at kdb_backtrace+0x67
> #1 0xffffffff80afa9e7 at vpanic+0x177
> #2 0xffffffff80afa863 at panic+0x43
> #3 0xffffffff80f7c14f at trap_fatal+0x35f
> #4 0xffffffff80f7b70e at trap+0x5e
> #5 0xffffffff80f5bccc at calltrap+0x8
> #6 0xffffffff80e07a17 at vm_pageout+0x87
> #7 0xffffffff80abd813 at fork_exit+0x83
> #8 0xffffffff80f5cc7e at fork_trampoline+0xe