From owner-freebsd-hackers@freebsd.org Wed Aug 29 02:28:53 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2445E109E65D; Wed, 29 Aug 2018 02:28:53 +0000 (UTC) (envelope-from meowthink@gmail.com) Received: from mail-oi0-x241.google.com (mail-oi0-x241.google.com [IPv6:2607:f8b0:4003:c06::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 95C9D778DC; Wed, 29 Aug 2018 02:28:52 +0000 (UTC) (envelope-from meowthink@gmail.com) Received: by mail-oi0-x241.google.com with SMTP id l202-v6so6498663oig.7; Tue, 28 Aug 2018 19:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lWmdyp8DhnDp22oaEuQ0u67p+c1T3HQRt6arR6PMxlg=; b=eQsWrGMA94Gww9M8JeJTNt4gAJ0r/Hyg154PTe+x9zVwytS94HMshb7wpmwspyQttK CWYlQkXMGsf+XLN9goOYfm8isbwpOJ+z/VTMINov6XFz1lKM5TluriocNB64IsQYgf3t uMiE3k25Fine9+Q4YIkc+Y5n5jjt92fqJ//LznBEWhy4q8eQP0TMb/rCoXpyLKYEd8o5 TOyEPR6rUqg+wkskIW42mkLG4eLl/plBRnqHe13B+7mBZJg2tle+/skkEFtoMknzlLtx FFaNUpYrQnx5HHqkp7GLqyj5kXkqbmu2Av2cHAXz0UutKUooq7a4hcIltDLDkrLCgWjP 12jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lWmdyp8DhnDp22oaEuQ0u67p+c1T3HQRt6arR6PMxlg=; b=gTAWuYivSB3lHw6TTqwchh0YMfqO4Ec7gzgNU+n9jD299M2do2+07t8pQz8vUGEcSn RqyKgvX9QR7hciUWW5lBhZrdjFt4AaQwsJsAxfH1Bo0oQEuBBzSoBCW0tl3sRMt0zZ2b f/923Quo8Ex4QhwToCNjY43CchCQpaj4INPuqMx9dl31i6n96GuZw812Rp/xezyJoGTR XBMieuhtwX4OTkJcw3grql1FT4d2d/32n1ZpenUKKW4fkkrQJgRn+vAjjs388ILXXuzg 1vCgm052nJESoo+VDsR2hkMA+nOVW/MqJ3CTC5DoXw1TUKyJ3qNMc0ePv+XlzIAXXTV1 RaIg== X-Gm-Message-State: APzg51CvO2SV/frbn+fS0D3UXSBS8mRfQzod7xp3ImiYWa2cq4sTODml oLytol6Etjn2P576hvyS1RGg3BSUMa9O0giPtAQ= X-Google-Smtp-Source: ANB0VdZJK3iZOJd1qAcWuqS2ciqb88xgGuEIDGUYs2h2okSQcu3UIIEGHBGhDnZVtSRg/DGnLgxHysj/Nlo7gSuXepU= X-Received: by 2002:aca:4802:: with SMTP id v2-v6mr805832oia.259.1535509731816; Tue, 28 Aug 2018 19:28:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Meowthink Date: Wed, 29 Aug 2018 10:28:40 +0800 Message-ID: Subject: Re: Help diagnose my Ryzen build problem (in progress) To: "karu.pruun" Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Aug 2018 02:28:53 -0000 Update: machdep.idle = hlt and machdep.idle_mwait = 0 failed also. It can't last even longer than machdep.idle = mwait, which could normally panic after a few passes of building gcc. I tried hlt twice, both not longer than half hour. Now, as another round of building 4 gccs in parallel is going to finish, with machdep.idle = spin and machdep.idle_mwait = 0. Can I say Ryzen 2400G probably have issues with both mwait and hlt? Regards, meowthink Fatal trap 12: page fault while in user mode cpuid = 6; apic id = 06 fault virtual address = 0x819cd0000 fault code = user write data, reserved bits in PTE instruction pointer = 0x43:0x80195de26 stack pointer = 0x3b:0x7fffffffb0b8 frame pointer = 0x3b:0x7fffffffb100 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 3, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17888 (ld) trap number = 12 panic: page fault cpuid = 6 KDB: stack backtrace: #0 0xffffffff80b414b7 at kdb_backtrace+0x67 #1 0xffffffff80afa9e7 at vpanic+0x177 #2 0xffffffff80afa863 at panic+0x43 #3 0xffffffff80f7c14f at trap_fatal+0x35f #4 0xffffffff80f7c1a9 at trap_pfault+0x49 #5 0xffffffff80f7ba10 at trap+0x360 #6 0xffffffff80f5bccc at calltrap+0x8 On Tue, Aug 28, 2018 at 11:47 PM Meowthink wrote: > > Hi Peeter, > > On 8/28/18, karu.pruun wrote: > > On Mon, Aug 27, 2018 at 6:07 PM Meowthink wrote: > > > >> >> Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my > >> >> Ryzen 5 2400G's model is 11h. > >> >> > >> >> On the microcode. It shall be updated through UEFI/BIOS updates. I > >> >> think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel > >> >> 0x810100b. > >> >> > >> >> Seems like ... the only thing I can do is sit down and wait? > >> > > >> > The revision > >> > > >> > https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763 > >> > > >> > works around the mwait issue, i.e. it sets > >> > > >> > sysctl machdep.idle_mwait=0 > >> > sysctl machdep.idle=hlt > >> > > >> > >> I think that shall not apply to 2400G, which is model 11h not 1h. > >> Here're what I have now: > >> > >> machdep.idle: acpi > >> machdep.idle_available: spin, mwait, hlt, acpi > >> machdep.idle_apl31: 0 > >> machdep.idle_mwait: 1 > >> > >> > Now it may or may not relate to your problem, but it appears that > >> > Ryzen 2400G also has another issue with HLT, see the DragonFly bug > >> > report > >> > > >> > https://bugs.dragonflybsd.org/issues/3131 > >> > > >> > >> Thanks a lot for that info. > >> It's much easier to prove your problem, since it's reproducible. But > >> mine was so random to catch... > >> Anyway, it seems like the IRET issue [1] is still not fixed? I'm > >> highly doubt that my issue is this related because my system became > >> significantly more stable since I stop that irq storm from bluetooth > >> module - Though it still panics occasionally. > >> So could anybody tell, what's the difference between FreeBSD > >> workaround [2] and the DragonflyBSD one? > >> > >> > which AMD is aware of and is possibly working on, but it may not have > >> > appeared in the errata yet. The bug report says that until this is > >> > fixed, the workaround is to also disable HLT in cpu_idle. I am not > >> > sure what is the correct value for the sysctl on FreeBSD, perhaps > >> > > >> > sysctl machdep.idle=0 > >> > > >> > or some other value? > >> > >> In the meantime, I have this microcode > >> > >> # cpucontrol -m 0x8b /dev/cpuctl0 > >> MSR 0x8b: 0x00000000 0x0810100b > >> > >> Hence I should use mwait? > >> Still don't know what should I set. Any idea? > > > > > > If I was you, I'd play around with the sysctls mentioned above and see > > if it helps. Start with disabling both mwait and hlt, perhaps > > > > machdep.idle=spin > > machdep.idle_mwait=0 > > > > (assuming that 'spin' means hlt will not used) and then if that does > > not lead to a panic, try enabling mwait. I can't test 2400G since I > > don't have it any more. I booted FreeBSD a couple of times but did not > > run it over long periods of time. > > It works! > After hours and hours of different stressing. I got 8 copies of gcc > built without any problem. > > But it costs lots of power and the fan will become very annoying. As > so, I don't think I'll test long term stability with this state. > > machdep.idle: acpi -> spin > - will add ~5W, maybe some deeper C states disabled? > machdep.idle_mwait: 1 -> 0 > - will add another ~50W, CPUs are working insomniac. > > I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both > failed with panics when I start building gcc pass by pass. > > I'm pretty sure mwait will cause problem, as once I experienced a > panic immediately after I issued the sysctl command (the 2nd dump info > followed) > > So my next step will be hlt. Still need some time, though. > > > > > Cheers > > > > Peeter > > > > -- > > > > Cheers, > meowthink > > ------------------------------------------------------------------------ > machdep.idle=mwait > > panic: ffs_syncvnode: syncing truncated data. > cpuid = 7 > KDB: stack backtrace: > #0 0xffffffff80b414b7 at kdb_backtrace+0x67 > #1 0xffffffff80afa9e7 at vpanic+0x177 > #2 0xffffffff80afa863 at panic+0x43 > #3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4 > #4 0xffffffff80dcc915 at ffs_fsync+0x25 > #5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82 > #6 0xffffffff80bc3a62 at sched_sync+0x412 > #7 0xffffffff80abd813 at fork_exit+0x83 > #8 0xffffffff80f5cc7e at fork_trampoline+0xe > > ------------------------------------------------------------------------ > machdep.idle_mwait=1 > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 7; apic id = 07 > instruction pointer = 0x20:0xffffffff80e094fe > stack pointer = 0x0:0xfffffe081e5df9e0 > frame pointer = 0x0:0xfffffe081e5dfa50 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 17 (dom0) > trap number = 9 > panic: general protection fault > cpuid = 7 > KDB: stack backtrace: > #0 0xffffffff80b414b7 at kdb_backtrace+0x67 > #1 0xffffffff80afa9e7 at vpanic+0x177 > #2 0xffffffff80afa863 at panic+0x43 > #3 0xffffffff80f7c14f at trap_fatal+0x35f > #4 0xffffffff80f7b70e at trap+0x5e > #5 0xffffffff80f5bccc at calltrap+0x8 > #6 0xffffffff80e07a17 at vm_pageout+0x87 > #7 0xffffffff80abd813 at fork_exit+0x83 > #8 0xffffffff80f5cc7e at fork_trampoline+0xe