From owner-freebsd-x11@freebsd.org Thu Dec 7 16:05:49 2017 Return-Path: Delivered-To: freebsd-x11@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA938E8B5FF; Thu, 7 Dec 2017 16:05:49 +0000 (UTC) (envelope-from aplattner@nvidia.com) Received: from hqemgate15.nvidia.com (hqemgate15.nvidia.com [216.228.121.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hqemgate15.nvidia.com", Issuer "RapidSSL SHA256 CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 98F006B54F; Thu, 7 Dec 2017 16:05:49 +0000 (UTC) (envelope-from aplattner@nvidia.com) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com id ; Thu, 07 Dec 2017 08:00:32 -0800 Received: from HQMAIL101.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Thu, 07 Dec 2017 08:00:42 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Thu, 07 Dec 2017 08:00:42 -0800 Received: from HQMAIL102.nvidia.com (172.18.146.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1293.2; Thu, 7 Dec 2017 16:00:42 +0000 Received: from krypton.plattnerplace.us (10.124.1.5) by HQMAIL102.nvidia.com (172.18.146.10) with Microsoft SMTP Server (TLS) id 15.0.1293.2; Thu, 7 Dec 2017 16:00:41 +0000 Subject: Re: couple of nvidia-driver issues To: Alan Somers , Andriy Gapon CC: Alexey Dokuchaev , freebsd-x11 , FreeBSD Current References: <07b9dbda-60ef-3643-308f-18a05e8ca958@FreeBSD.org> <20171205140308.GA94043@FreeBSD.org> <5e95dc14-9d3b-e2eb-b89c-f66f7857eb58@FreeBSD.org> From: Aaron Plattner X-Nvconfidentiality: public Message-ID: Date: Thu, 7 Dec 2017 08:00:40 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL102.nvidia.com (172.18.146.10) X-BeenThere: freebsd-x11@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: X11 on FreeBSD -- maintaining and support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2017 16:05:49 -0000 On 12/07/2017 07:35 AM, Alan Somers wrote: > On Thu, Dec 7, 2017 at 2:33 AM, Andriy Gapon > wrote: >=20 >=20 > [cc-ing current@ to raise more awareness] >=20 > On 05/12/2017 16:03, Alexey Dokuchaev wrote: > > On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote: > >> > >> I have reported a couple of nvidia-driver issues in the FreeBSD > section > >> of the nVidia developer forum, but no replies so far. > >> > >> Well, the first issue is not with the driver, but with a utility > that > >> comes with it, nvidia-smi: > >> > https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-q= uery-gpu-spins-forever-on-freebsd-head-amd64-/ > > >> I wonder if I am the only one affected or if I see the problem > because > >> I am on head or something else. > >> I am pretty sure that the problem is caused by a programming bug > related > >> to strtok_r. > > > > I'll try to reproduce it and report back. >=20 > I've done some work with a debugger and it seems that there is code > that does > something like this: >=20 > char *last =3D NULL; >=20 > while (1) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (last =3D=3D NULL) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 p =3D strtok= _r(str, sep, &last); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 else > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 p =3D strtok= _r(NULL, sep, &last); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (p =3D=3D NULL) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break; > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ... > } >=20 > The problem is that when 'p' points to the last token, 'last' is > NULL (in > FreeBSD implementation of strtok_r).=C2=A0 That means that when we go= to > the next > iteration the parsing starts all over again leading to the endless lo= op. > The code is incorrect from the standards point of view, because the > value of > 'last' is completely opaque and should not be used for anything else > but passing > it back to strtok_r. >=20 > I used gdb -w to change the logic to: >=20 > char *last =3D 1; >=20 > While (1) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (last =3D=3D 1) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 p =3D strtok= _r(str, sep, &last); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 else > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 p =3D strtok= _r(NULL, sep, &last); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ... > } >=20 > Where 1 is used as an "impossible" pointer value which is neither > NULL nor a > valid pointer that can be set by strtok_r.=C2=A0 It's not ideal, but > binary code > editing is not as easy as that of source code. >=20 > The binary patch is here: > https://people.freebsd.org/~avg/nvidia-smi.bsdiff > >=20 > >> The second issue is with the FreeBSD support for the kernel drive= r: > >> > https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-relate= d-to-nvkms_timers-lock-sx-lock-/ > > >> I would like to get some feedback on my analysis. > >> I am testing this patch right now: > >> > https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia= -modeset-freebsd.c > > > > > Unfortunately, I'm not an expert on kernel locking primitives to > give you > > a proper review, let's see what others have to say. >=20 > It's been a while since I posted the patch and there are no comments > yet. > I can only add that I am running an INVARIANTS and WITNESS enabled > kernel all > the time and before the patch I was getting kernel panics every now > and then. > Since I started using the patch I haven't had a single nvidia panic y= et. >=20 > >> Also, what's the best place or who are the best people with whom = to > >> discuss such issues? > > > > Yes, this is a problem now: since Christian Zander had left > nVidia, he > > could not tell me who'd be their next liaison to talk to from Free= BSD > > community. :-( >=20 > Oh, I didn't know about Christian's departure. > So, we are not in a very good position now. >=20 >=20 > How about Aaron Plattner (CC'd).=C2=A0 Aaron, are you still working on=20 > FreeBSD driver issues? Thanks for the heads up, Alan. I filed bug 2032249 to track this. -- Aaron