From owner-freebsd-stable@freebsd.org Sat Mar 17 18:01:23 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09A3BF5CED2 for ; Sat, 17 Mar 2018 18:01:23 +0000 (UTC) (envelope-from nimrod@nimrod.is-a-geek.net) Received: from mail-yw0-x236.google.com (mail-yw0-x236.google.com [IPv6:2607:f8b0:4002:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9597A87429 for ; Sat, 17 Mar 2018 18:01:22 +0000 (UTC) (envelope-from nimrod@nimrod.is-a-geek.net) Received: by mail-yw0-x236.google.com with SMTP id l200so9062301ywb.0 for ; Sat, 17 Mar 2018 11:01:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XkbX/jfqc9p1c/3m1aanmaUttuJegrSbnrCQq6NXxaU=; b=ZcuU+EevlgyO9sv5zr7wwnPyYX4JpkjjSb4ny2crzKgyey3zbHcKDtSvDmnOMQC1BZ Vkpst4a3Gf1Pi6qUyMeKWzx7RCA079ICx4xmEH4KmERo8P6evm/pbqN/rmOU4Gn7UIUB C0iktEPZw7lvs0VHtQ3Adu9TgvjWl58JF92IX4Wl61v5mWORf7AuL4/4rM4i4Aml4uyM Yue468JOwdNDZkhW7GdCv5CaRrPpjE7o1QdnBC7p4i7EXfmNMOOsLP3zT3ccP41Z2bPM DWy2xgHlRjTIVMuuw57b3z/oO+QWJA09jNgMnrZs6jHexAtEr1ZLjRIqcVLV59I+UKTn Tsvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XkbX/jfqc9p1c/3m1aanmaUttuJegrSbnrCQq6NXxaU=; b=H1+Ugl1Fy3AL83TcxmnqFKDE+U2uUyC3zf3/lgmCd3zI8OHKWoxKt2Bq6hAB7ajMGc pi4mBXZ0RkQlXiAPacU02sHHgC8y3cm+erLZp4G0bWACgek/mqNUOB1Vw742Xz2ZlJ8+ Th7qX1RjpPMaZN8Ap62112ZAWZKxAJGaSSuwPy75XKQ7VQt9f1s2B3toxe5mlPEDuH87 lmYmW1UEkIuEewhjwvrD34wKf4BCFfoHbUJHs6/HjPaxIMsBJcLXWDMZbHsxOzTDUUBc nFkv0/wOD3Ez1QocloYD5Dw5dl6kmCRixthEZm+GEbWsV8YSY0vmimZwwgs+sTcDWTmV Kt8g== X-Gm-Message-State: AElRT7Gqq6TPAwWpXhWm4e4hRSSsaugry7fob3r43ylgmMIxl+rWWh/N zWFVSZgppQfWTpVSdm1zHK0S26Vpabz2Zy4+9QG1fQ== X-Google-Smtp-Source: AG47ELtrdQeH9/8l+j7ymtbsb1kQljbuynGCm1Pl5nixOW0YTmag/qdzGAjjS4T50VlCe6+aGzcyhaY1mGWNdSIaa84= X-Received: by 2002:a25:2387:: with SMTP id j129-v6mr3689622ybj.375.1521309681343; Sat, 17 Mar 2018 11:01:21 -0700 (PDT) MIME-Version: 1.0 References: <92a60e14-f532-2647-d45d-b500fc59ba88@sentex.net> <425be16f-9fdc-9ed6-72b1-02e28bfd130f@sentex.net> In-Reply-To: From: Nimrod Levy Date: Sat, 17 Mar 2018 18:01:10 +0000 Message-ID: Subject: Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?) To: Mike Tancsa Cc: FreeBSD-STABLE Mailing List Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Mar 2018 18:01:23 -0000 Looks like I got almost 4 full weeks before it locked up this morning :( On Fri, Feb 23, 2018 at 3:33 PM Nimrod Levy wrote: > After a couple of hours of running the iperf commands you were testing > with, I'm unable to duplicate this so far. > > I'm running with FreeBSD stable from 17-Feb with the commits noted in > https://reviews.freebsd.org/D14347 pulled in. > > I've also lowered the memory clock and disabled c-states in the bios. > > The bhyve VM is running CentOS. > > The system has been up for over 6 days and has been running the iperf3 > loop for over 2 hours. > > The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM. > > -- > Nimrod > > > On Fri, Feb 23, 2018 at 3:22 PM, Mike Tancsa wrote: > >> Actually I can confirm the same sort of hard lockup happens on my Epyc >> board with RELENG11. It also happens in current. I will file a PR and >> post on freebsd-current in case someone has any suggestions on how to >> try and figure out whats going on. >> >> I upgraded the box to >> 12.0-CURRENT #0 r329866 >> in order to see if it could avoid the lockup, but same deal. The vmm >> driver does seem different when loaded, but the same lock up under load >> >> CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.35-MHz >> K8-class CPU) >> Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 >> >> >> Features=0x178bfbff >> >> >> Features2=0x7ed8320b >> AMD Features=0x2e500800 >> AMD >> >> Features2=0x35c233ff >> Structured Extended >> >> Features=0x209c01a9 >> XSAVE Features=0xf >> AMD Extended Feature Extensions ID EBX=0x7 >> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 >> TSC: P-state invariant, performance statistics >> >> >> AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0 >> driver bug: Unable to set devclass (class: ppc devname: (unknown)) >> ivhd0: on acpi0 >> ivhd0: Flag:b0 >> ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0 >> ivhd0: Extended features[31:0]:22294ada HATS = >> 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 >> DualPortLogSup = 0x2 DualEventLogSup = 0x2 >> ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f >> DevTblSegSup = 0x3 MarcSup = 0x1 >> ivhd0: supported paging level:7, will use only: 4 >> ivhd0: device range: 0x0 - 0xffff >> ivhd0: PCI cap 0x190b640f@0x40 feature:19 >> >> >> >> On 2/23/2018 12:35 PM, Nimrod Levy wrote: >> > Now that is a fascinating data point. My machine that I've been having >> > issues with has been running a bhyve vm from the beginning. I never >> > made the connection. I'll try throwing some network traffic at the VM >> > and see if I can make it lock up. >> > >> > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa > > > wrote: >> > >> > On 2/22/2018 3:41 PM, Mike Tancsa wrote: >> > > On 2/21/2018 3:04 PM, Mike Tancsa wrote: >> > >> Not sure if I have found another issue specific to Ryzen, or a >> bug that >> > >> manifests itself on Ryzen systems easier. I installed the latest >> > >> virtualbox from the ports and was doing some network performance >> tests >> > >> between a vm and the hypervisor using iperf3. The guest is just >> a >> > >> RELENG11 image and the network is an em nic bridged to epair1b >> > > >> > > This looks possibly related to VirtualBox. Doing the same tests >> and more >> > > using bhyve, I dont get any lockup. Not to mention, network IO >> is MUCH >> > > faster. >> > >> > >> > Actually, it just took a little bit longer to lock up the box with >> bhyve >> > on RELENG_11 as the hypervisor. Would be great if anyone can >> confirm >> > this locks up their Ryzen boxes ? I tried 2 different boxes to >> eliminate >> > a hardware issue. Also tried a similar test on Ubuntu and I can >> spin up >> > 4 instances and run without lockups. >> > >> > Just grab a copy of >> > >> > >> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz >> > < >> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz >> > >> > >> > and make 2 copies. tmp.raw and tmp2.raw >> > >> > >> > kldload vmm >> > ifconfig tap0 create >> > ifconfig tap1 create >> > ifconfig tap1 up >> > ifconfig tap0 up >> > ifconfig bridge0 create addm tap0 addm tap1 >> > ifconfig bridge0 192.168.99.1/24 >> > >> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t >> tap0 >> > -d tmp.raw BSD11a >> > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t >> tap1 >> > -d tmp2.raw BSD11b >> > >> > Install netperf on the 2 vms and give the vtnet interface >> > 192.168.99.2/24 and 192.168.99.3/24 >> > >> > >> > In both VMs pkg install iperf3 and start it up as >> > iperf -s >> > >> > In the hypervisor, >> > iperf -t 10000 -R -c 192.168.99.2 >> > iperf -t 10000 -c 192.168.99.3 >> > >> > >> > the box locks up solid after 5-20 min. Same hardware with Ubuntu >> and >> > virtual box and 4 instances work fine, no lockups after a day so not >> > sure whats up but it seems to be something with the Ryzen CPU >> running as >> > a hypervisor or with some type of load :( >> > >> > Prior to lockup I had a stream of netstat -m writing to a file >> every 5 >> > seconds. The last entry was below. It doesnt seem to be leak. >> > >> > Thu Feb 22 17:14:28 EST 2018 >> > 8694/10281/18975 mbufs in use (current/cache/total) >> > 8225/5211/13436/2038424 mbuf clusters in use >> (current/cache/total/max) >> > 8225/5184 mbuf+clusters out of packet secondary zone in use >> > (current/cache) >> > 461/3747/4208/1019211 4k (page size) jumbo clusters in use >> > (current/cache/total/max) >> > 0/0/0/301988 9k jumbo clusters in use (current/cache/total/max) >> > 0/0/0/169868 16k jumbo clusters in use (current/cache/total/max) >> > 20467K/27980K/48447K bytes allocated to network >> (current/cache/total) >> > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >> > 0 sendfile syscalls >> > 0 sendfile syscalls completed without I/O request >> > 0 requests for I/O initiated by sendfile >> > 0 pages read by sendfile as part of a request >> > 0 pages were valid at time of a sendfile request >> > 0 pages were requested for read ahead by applications >> > 0 pages were read ahead by sendfile >> > 0 times sendfile encountered an already busy page >> > 0 requests for sfbufs denied >> > 0 requests for sfbufs delayed >> > >> > >> > >> > ---Mike >> > >> > >> > >> > >> > -- >> > ------------------- >> > Mike Tancsa, tel +1 519 651 3400 x203 <(519)%20651-3400> >> > >> > Sentex Communications, mike@sentex.net >> > Providing Internet services since 1994 www.sentex.net >> > >> > Cambridge, Ontario Canada >> > _______________________________________________ >> > freebsd-stable@freebsd.org >> > mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> > >> > To unsubscribe, send any mail to >> > "freebsd-stable-unsubscribe@freebsd.org >> > " >> > >> > >> >> >> -- >> ------------------- >> Mike Tancsa, tel +1 519 651 3400 x203 >> Sentex Communications, mike@sentex.net >> Providing Internet services since 1994 www.sentex.net >> Cambridge, Ontario Canada >> > > -- -- Nimrod