Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Dec 2017 11:33:51 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Alexey Dokuchaev <danfe@freebsd.org>
Cc:        freebsd-x11 <freebsd-x11@freebsd.org>, FreeBSD Current <freebsd-current@FreeBSD.org>
Subject:   Re: couple of nvidia-driver issues
Message-ID:  <5e95dc14-9d3b-e2eb-b89c-f66f7857eb58@FreeBSD.org>
In-Reply-To: <20171205140308.GA94043@FreeBSD.org>
References:  <07b9dbda-60ef-3643-308f-18a05e8ca958@FreeBSD.org> <20171205140308.GA94043@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[cc-ing current@ to raise more awareness]

On 05/12/2017 16:03, Alexey Dokuchaev wrote:
> On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote:
>>
>> I have reported a couple of nvidia-driver issues in the FreeBSD section
>> of the nVidia developer forum, but no replies so far.
>>
>> Well, the first issue is not with the driver, but with a utility that
>> comes with it, nvidia-smi:
>> https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/
>> I wonder if I am the only one affected or if I see the problem because
>> I am on head or something else.
>> I am pretty sure that the problem is caused by a programming bug related
>> to strtok_r.
> 
> I'll try to reproduce it and report back.

I've done some work with a debugger and it seems that there is code that does
something like this:

char *last = NULL;

while (1) {
	if (last == NULL)
		p = strtok_r(str, sep, &last);
	else
		p = strtok_r(NULL, sep, &last);
	if (p == NULL)
		break;
	...
}

The problem is that when 'p' points to the last token, 'last' is NULL (in
FreeBSD implementation of strtok_r).  That means that when we go to the next
iteration the parsing starts all over again leading to the endless loop.
The code is incorrect from the standards point of view, because the value of
'last' is completely opaque and should not be used for anything else but passing
it back to strtok_r.

I used gdb -w to change the logic to:

char *last = 1;

While (1) {
	if (last == 1)
		p = strtok_r(str, sep, &last);
	else
		p = strtok_r(NULL, sep, &last);
	...
}

Where 1 is used as an "impossible" pointer value which is neither NULL nor a
valid pointer that can be set by strtok_r.  It's not ideal, but binary code
editing is not as easy as that of source code.

The binary patch is here: https://people.freebsd.org/~avg/nvidia-smi.bsdiff

>> The second issue is with the FreeBSD support for the kernel driver:
>> https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/
>> I would like to get some feedback on my analysis.
>> I am testing this patch right now:
>> https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c
> 
> Unfortunately, I'm not an expert on kernel locking primitives to give you
> a proper review, let's see what others have to say.

It's been a while since I posted the patch and there are no comments yet.
I can only add that I am running an INVARIANTS and WITNESS enabled kernel all
the time and before the patch I was getting kernel panics every now and then.
Since I started using the patch I haven't had a single nvidia panic yet.

>> Also, what's the best place or who are the best people with whom to
>> discuss such issues?
> 
> Yes, this is a problem now: since Christian Zander had left nVidia, he
> could not tell me who'd be their next liaison to talk to from FreeBSD
> community. :-(

Oh, I didn't know about Christian's departure.
So, we are not in a very good position now.


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5e95dc14-9d3b-e2eb-b89c-f66f7857eb58>