Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Jan 2017 14:36:24 -0800
From:      <soralx@cydem.org>
To:        <freebsd-virtualization@freebsd.org>
Subject:   Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Message-ID:  <20170114235026.22232a7a@mscad14>
In-Reply-To: <20170114170126.48920fc8@mscad14>
References:  <20170111213941.0789c8ce@mscad14> <201701130144.v0D1ifxJ051207@pdx.rh.CN85.dnsmgr.net> <20170112234810.2cb83671@mscad14> <20170113004621.16be4e0a@mscad14> <1303eaba-791d-52fe-eb40-6a984b40b99a@freebsd.org> <20170113223437.1b5fa983@mscad14> <20170113225356.696842fd@mscad14> <f2b1352b-78bc-0789-d93a-85244926231e@freebsd.org> <20170114170126.48920fc8@mscad14>

next in thread | previous in thread | raw e-mail | index | archive | help

> Some screens attached (hopefully not too heavy).
> Didn't have time to do better. Select your favourite ones.
>
> I upgraded Linux to newer version (Ubuntu 16.10, kernel 4.8),
> and it broke the driver. OpenCL does not work at all anymore.
> The screens were made on newer system -- nothing seemed to be
> changed in Xorg.

Actually, OpenCL still works, after I rebooted host and reinstalled
AMD's Pro driver and OpenCL SDK. I did upgrade to even newer Linux,
I guess it's the latest development Ubuntu version (don't know how
to find OS version [`uname -a` doesn't tell], but kernel is
4.9.0-11-generic). Messages like 'Warning: LLVM emitted unknown
config register: 0x4' seem to be gone.

I'm getting strange numbers with cl-mem test now, I think stranger
than before the upgrade (but not sure, did not do many cl-mem tests
back then):
  # ~/cl-mem/cl-mem 
    Running write test.
    128 GB in 688.6 ms (185.9 GB/s)
    Running read test.
    128 GB in 596.1 ms (214.7 GB/s)
    Running copy test.
    128 GB in 715.3 ms (179.0 GB/s)
  # ~/cl-mem/cl-mem 
    Running write test.
    128 GB in 684.8 ms (186.9 GB/s)
    Running read test.
    128 GB in 596.8 ms (214.5 GB/s)
    Running copy test.
    128 GB in 715.1 ms (179.0 GB/s)

After `glxgears -fullscreen` run:
  # ~/cl-mem/cl-mem 
    Running write test.
    128 GB in 868.3 ms (147.4 GB/s)
    Running read test.
    128 GB in 275.4 ms (464.8 GB/s)
    Running copy test.
    128 GB in 3878.0 ms (33.0 GB/s)
  # ~/cl-mem/cl-mem 
    Running write test.
    128 GB in 878.8 ms (145.7 GB/s)
    Running read test.
    128 GB in 293.1 ms (436.7 GB/s)
    Running copy test.
    128 GB in 3659.9 ms (35.0 GB/s)
[after couple minutes]
  # ~/cl-mem/cl-mem 
    Running write test.
    128 GB in 687.4 ms (186.2 GB/s)
    Running read test.
    128 GB in 596.8 ms (214.5 GB/s)
    Running copy test.
    128 GB in 715.0 ms (179.0 GB/s)

The copy test is slow, because there are _lots_ of kernel messages
printed like this:
  [ 1780.569388] amdgpu 0000:00:04.0: GPU fault detected: 147 0x0fba4402
  [ 1780.569830] amdgpu 0000:00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00001073
  [ 1780.570357] amdgpu 0000:00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B044002
or more generally:
  [ 1780.569388] amdgpu 0000:00:04.0: GPU fault detected: 147 0x0xxxxx02
  [ 1780.569830] amdgpu 0000:00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0000xxxx
  [ 1780.570357] amdgpu 0000:00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B04x002

The read and write tests results are way too high, as I'm assuming
the test is transferring data over PCIe. The copy test at 180 GB/s
is reasonable, and matches hardware expectations and others' tests [0].

Also 'mixbench' seems to produce reasonable results that are
order-of-2-magnitude comparable to others' tests, like [0]:
5 TFLOPS single precision, 350 MFLOPS double precision (should be
single_precision/16 for ATI cards), and 1 TIOPS for integer (32-bit).

More captures of Xorg screen attached. Behaviour is very strange,
but may give some hints of what might be going wrong to those familiar
with X, video memory, framebuffers, DRI, GFX and all that weird and
wonderful stuff.

When `glxgears` is run in fullscreen mode, what's on screen depends
on each run. Framerate varies from run to run, is mostly stable within
one run but can change abruptly by 100's of FPS. When the screen is
blank, the framerate is slowest (750~1500 FPS). When only parts of
gears are rendered, framerate is usually higher, up to 2500 FPS.
Rotation is smooth, but with brightness flicker of some parts of
gears sometimes.

I should mention these numbers are for 1600x1200 screen.
Also, with VNC session closed (but 'vino' still running),
the frame rate goes up to far more reasonable 6400 FPS.

[0] http://cdn.videocardz.com/1/2016/06/Radeon-RX-480-vs-GTX-970-AIDA-GPU-2.png

CC'ing to freebsd-virtualization@, as this is likely to be of
general interest (the screenshot attachment has been removed).

-- 
[SorAlx]  ridin' VN2000 Classic LT



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170114235026.22232a7a>