From owner-freebsd-stable@freebsd.org Fri Feb 23 20:33:34 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B517F1BB8A for ; Fri, 23 Feb 2018 20:33:34 +0000 (UTC) (envelope-from nimrod@nimrod.is-a-geek.net) Received: from mail-yw0-x235.google.com (mail-yw0-x235.google.com [IPv6:2607:f8b0:4002:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 35CB97744A for ; Fri, 23 Feb 2018 20:33:33 +0000 (UTC) (envelope-from nimrod@nimrod.is-a-geek.net) Received: by mail-yw0-x235.google.com with SMTP id b70so3222032ywh.5 for ; Fri, 23 Feb 2018 12:33:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nimrod-is-a-geek-net.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=6DGCcyGHHwg0qwBLVhNfCmjIlWWtdLWGKBD4Gyktt0E=; b=KKSPINtuZ1R2LZgGxba/2l5JDsL1duqiwYMSklpGsW/jh2sCY+LYH8iQe0mr3rQhnv eY4fhwgZUryNOxZx6O5pbj1RqhhxMbWGjOFnKN4zro+YOAtr2uyWPSUmwwbdhGI2OvA8 suV04TN3d8d+M1jYQU2ubRBhE5rr0Tp3yD9NhAIRCwXqB8J/wqu8lGBsRFF8rQ6EN5Tw CRE1fXBJOT/3WpPqR2I1TrEGrBYHo4G7QdolzDesXytk4itxPbHESenyjZw+TKxtJPNL p9RD0sVHYB+cM4HKzHJUOlwaxntfnHAkT94x/ZmJFbUqUW8HnJ9sqhTCq5SbQenx3lak KGhw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=6DGCcyGHHwg0qwBLVhNfCmjIlWWtdLWGKBD4Gyktt0E=; b=vgaINcNHXP/zBnvsvF9JrgKe1FQirVSir/K9gn8jnqCcrOCMYv/wdR47Ah7Z44ykk4 uZkJcQdo/O7uB35FBtkg3mSbtCq2pBCjr13y0Gp86BLKc7PYH03tjOyGsFYQUXy5dWtJ BPK5sU3sA0LOtkdR1AiiVTsxxhQUOb2G8+yo1UzLQXZY6C0IQxJZ8yQjZgmUK5PQhYeu vE43GYA632ppgAyFT+wtHXxJFXYofXJpupw3jdLVqneSPQzTF6ypp6OLoOg4Xv9Z3UZO RDVRpiT06sXAYZ4d6j+AvgXDdq9CXQwYCk6piVIs/8xS2pf/WDgLnqmi+TGHfTHV2njV y7YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=6DGCcyGHHwg0qwBLVhNfCmjIlWWtdLWGKBD4Gyktt0E=; b=TgAmQSiZrVdz+fW/T0ymEa0nGM/g+qNaZIkArfGvpIWPGU2zvgNDIaTtpwojjbHR8Z 8vb2Z+H+l9SVsdtqlAkWQvPuXzlaXH1gLdsNlXxqFvU3f8w6mcPeBQS4OsBc5XJK7bhS rjfWAI1cuktSCrbF/r4uLx4QItbEYNTTJB32OG6ozK44ITngnFaJqa3IrauS11wdLxK2 KOYMoXNoxTwYUmT9C+PDSQBTrl/fpV+AIFlDKnxlwz+2518VZlqVSwJY2DVLnWO7hSHx 0xSy3214Uulp2IkXn1V5qH9iJANmf65NjgIv1Ic/GeHbTP6+EAFwGf1Utoqo66AVL4Z7 h+Yw== X-Gm-Message-State: APf1xPATEqIffd4kkTsQVUvMN8pBDop+uHp2iAIY0gAzDplgrRwJ6Cjr dvNMjXuspeaXGp4CEGGgyQWsDDk2dN508w0Yj3ptXA== X-Google-Smtp-Source: AH8x225zZ90z0UdRJHIYZVPSQnLYnn3zZ3OFhc4b88HmqSxXT9gsOBSuDSq4AsMWzOb2lscywqdeS3t2mTq08it+4ag= X-Received: by 10.129.145.202 with SMTP id i193mr2049758ywg.219.1519418013090; Fri, 23 Feb 2018 12:33:33 -0800 (PST) MIME-Version: 1.0 Sender: nimrod@nimrod.is-a-geek.net Received: by 10.129.102.69 with HTTP; Fri, 23 Feb 2018 12:33:32 -0800 (PST) X-Originating-IP: [108.31.4.177] In-Reply-To: <425be16f-9fdc-9ed6-72b1-02e28bfd130f@sentex.net> References: <92a60e14-f532-2647-d45d-b500fc59ba88@sentex.net> <425be16f-9fdc-9ed6-72b1-02e28bfd130f@sentex.net> From: Nimrod Levy Date: Fri, 23 Feb 2018 15:33:32 -0500 X-Google-Sender-Auth: uiR8c541ATdotzausm4FgZtD5mg Message-ID: Subject: Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?) To: Mike Tancsa Cc: FreeBSD-STABLE Mailing List Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Feb 2018 20:33:34 -0000 After a couple of hours of running the iperf commands you were testing with, I'm unable to duplicate this so far. I'm running with FreeBSD stable from 17-Feb with the commits noted in https://reviews.freebsd.org/D14347 pulled in. I've also lowered the memory clock and disabled c-states in the bios. The bhyve VM is running CentOS. The system has been up for over 6 days and has been running the iperf3 loop for over 2 hours. The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM. -- Nimrod On Fri, Feb 23, 2018 at 3:22 PM, Mike Tancsa wrote: > Actually I can confirm the same sort of hard lockup happens on my Epyc > board with RELENG11. It also happens in current. I will file a PR and > post on freebsd-current in case someone has any suggestions on how to > try and figure out whats going on. > > I upgraded the box to > 12.0-CURRENT #0 r329866 > in order to see if it could avoid the lockup, but same deal. The vmm > driver does seem different when loaded, but the same lock up under load > > CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.35-MHz > K8-class CPU) > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > > Features=0x178bfbff SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > > Features2=0x7ed8320b 1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> > AMD Features=0x2e500800 > AMD > Features2=0x35c233ff Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX> > Structured Extended > Features=0x209c01a9 SMAP,CLFLUSHOPT,SHA> > XSAVE Features=0xf > AMD Extended Feature Extensions ID EBX=0x7 > SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 > TSC: P-state invariant, performance statistics > > > AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0 > driver bug: Unable to set devclass (class: ppc devname: (unknown)) > ivhd0: on acpi0 > ivhd0: Flag:b0 > ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0 > ivhd0: Extended features[31:0]:22294ada HATS = > 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 > DualPortLogSup = 0x2 DualEventLogSup = 0x2 > ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f > DevTblSegSup = 0x3 MarcSup = 0x1 > ivhd0: supported paging level:7, will use only: 4 > ivhd0: device range: 0x0 - 0xffff > ivhd0: PCI cap 0x190b640f@0x40 feature:19 > > > > On 2/23/2018 12:35 PM, Nimrod Levy wrote: > > Now that is a fascinating data point. My machine that I've been having > > issues with has been running a bhyve vm from the beginning. I never > > made the connection. I'll try throwing some network traffic at the VM > > and see if I can make it lock up. > > > > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa > > wrote: > > > > On 2/22/2018 3:41 PM, Mike Tancsa wrote: > > > On 2/21/2018 3:04 PM, Mike Tancsa wrote: > > >> Not sure if I have found another issue specific to Ryzen, or a > bug that > > >> manifests itself on Ryzen systems easier. I installed the latest > > >> virtualbox from the ports and was doing some network performance > tests > > >> between a vm and the hypervisor using iperf3. The guest is just a > > >> RELENG11 image and the network is an em nic bridged to epair1b > > > > > > This looks possibly related to VirtualBox. Doing the same tests > and more > > > using bhyve, I dont get any lockup. Not to mention, network IO is > MUCH > > > faster. > > > > > > Actually, it just took a little bit longer to lock up the box with > bhyve > > on RELENG_11 as the hypervisor. Would be great if anyone can > confirm > > this locks up their Ryzen boxes ? I tried 2 different boxes to > eliminate > > a hardware issue. Also tried a similar test on Ubuntu and I can > spin up > > 4 instances and run without lockups. > > > > Just grab a copy of > > > > https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RE > LEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz > > 1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz> > > > > and make 2 copies. tmp.raw and tmp2.raw > > > > > > kldload vmm > > ifconfig tap0 create > > ifconfig tap1 create > > ifconfig tap1 up > > ifconfig tap0 up > > ifconfig bridge0 create addm tap0 addm tap1 > > ifconfig bridge0 192.168.99.1/24 > > > > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t > tap0 > > -d tmp.raw BSD11a > > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t > tap1 > > -d tmp2.raw BSD11b > > > > Install netperf on the 2 vms and give the vtnet interface > > 192.168.99.2/24 and 192.168.99.3/24 > > > > > > In both VMs pkg install iperf3 and start it up as > > iperf -s > > > > In the hypervisor, > > iperf -t 10000 -R -c 192.168.99.2 > > iperf -t 10000 -c 192.168.99.3 > > > > > > the box locks up solid after 5-20 min. Same hardware with Ubuntu and > > virtual box and 4 instances work fine, no lockups after a day so not > > sure whats up but it seems to be something with the Ryzen CPU > running as > > a hypervisor or with some type of load :( > > > > Prior to lockup I had a stream of netstat -m writing to a file every > 5 > > seconds. The last entry was below. It doesnt seem to be leak. > > > > Thu Feb 22 17:14:28 EST 2018 > > 8694/10281/18975 mbufs in use (current/cache/total) > > 8225/5211/13436/2038424 mbuf clusters in use > (current/cache/total/max) > > 8225/5184 mbuf+clusters out of packet secondary zone in use > > (current/cache) > > 461/3747/4208/1019211 4k (page size) jumbo clusters in use > > (current/cache/total/max) > > 0/0/0/301988 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/169868 16k jumbo clusters in use (current/cache/total/max) > > 20467K/27980K/48447K bytes allocated to network (current/cache/total) > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > > 0 sendfile syscalls > > 0 sendfile syscalls completed without I/O request > > 0 requests for I/O initiated by sendfile > > 0 pages read by sendfile as part of a request > > 0 pages were valid at time of a sendfile request > > 0 pages were requested for read ahead by applications > > 0 pages were read ahead by sendfile > > 0 times sendfile encountered an already busy page > > 0 requests for sfbufs denied > > 0 requests for sfbufs delayed > > > > > > > > ---Mike > > > > > > > > > > -- > > ------------------- > > Mike Tancsa, tel +1 519 651 3400 x203 <(519)%20651-3400> > > > > Sentex Communications, mike@sentex.net > > Providing Internet services since 1994 www.sentex.net > > > > Cambridge, Ontario Canada > > _______________________________________________ > > freebsd-stable@freebsd.org > > mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > > > > To unsubscribe, send any mail to > > "freebsd-stable-unsubscribe@freebsd.org > > " > > > > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 x203 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada >