From nobody Fri Jun 25 05:54:25 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A63955D35EC for ; Fri, 25 Jun 2021 05:58:26 +0000 (UTC) (envelope-from fernando.apesteguia@gmail.com) Received: from mail-yb1-xb2e.google.com (mail-yb1-xb2e.google.com [IPv6:2607:f8b0:4864:20::b2e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GB5sG427pz3DnV; Fri, 25 Jun 2021 05:58:26 +0000 (UTC) (envelope-from fernando.apesteguia@gmail.com) Received: by mail-yb1-xb2e.google.com with SMTP id b64so2955245yba.0; Thu, 24 Jun 2021 22:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vSJUhcQRcGtZbJ9++z9l3ZsKiAwb4hVrGeXX/pCGerk=; b=XcyIbUp7vhGa4NEkctvUezoflfe1fxOmlpFqlkftsFs9zBCWzOsciAo86/UTke2I+q WGCZMRxtJnT/8DBfCiR9CVURxK4VLj76UdAbe3RZotninGTt189bP2XwsQOJG/vXOxSE 6g2Pirdtne8mKDw5Y0OOeW0g/CR6k9VGzgzCCF/Rq8HgqeGSutf0VJAEgDSyY+AHuDSL cowZC05cwYMMvlyn1NALzlvHXOJxPU+uVDFBuSEOn8r5/u1a1Int11EungbEXQTkkY9H Z/QbWYzbbhuMZ2VXlUuI/tfhuYQfiCKzBqkdh34NsY1Iyo6bZ4h5qun4cBbq3bHv0jRf h+xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vSJUhcQRcGtZbJ9++z9l3ZsKiAwb4hVrGeXX/pCGerk=; b=V4oxUjK49lcqvXhjSYzrfv9fdTz0shYszBcLYKD9sRS2EFE1/pONXurAIujO1DW4C6 s5HtAE55ZtKFaBdZJhomI9rKBRTNafsG1E8eLTQpMhJFK8H0S1AitUNlmuqEeK3B3/01 JzthxvxuVpxHWLv5KYwYzOjcYYVPTzbsaQxdKjV/blyufFl90wQYnIPAMWrmjYz27xNn /DYxAFyUjfUE5J4lmH1IbM7HwYDkmWJRDj5A8nF8ZhadMG1cQ66jbyZ7mTBD1EmXTxVd f5tYgkzLIAEZ+Rxf79v+UXdLqAIK3dO6oCp/wIxsx7F4s9SmEVJeyKEsNXEJe6l3XOTx xTMQ== X-Gm-Message-State: AOAM530E28U2s1cFR/hBTBqZ3CBeSVnGZ+7ZVkPkC+YEO/g1EicHk7pL A9MBWK8B95dTKgljuNlg0c6gPT0XubNyc0y6PW1C0o4y3/s= X-Google-Smtp-Source: ABdhPJxxW5bQX9fNrdJNrVfydLrWVOa+B/g5VivKcVS4zfNSPhaW8McBi2LU3wZOef4FXdT96/7yPscY8medL/AgwnI= X-Received: by 2002:a25:69c7:: with SMTP id e190mr10691228ybc.187.1624600705409; Thu, 24 Jun 2021 22:58:25 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= Date: Fri, 25 Jun 2021 07:54:25 +0200 Message-ID: Subject: Re: nvidia_drv.so/Xorg crashes To: Craig Leres Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4GB5sG427pz3DnV X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, Jun 25, 2021 at 4:31 AM Craig Leres wrote: > > I have four (12.2-RELEASE) systems between the office at home that are > full or part time FreeBSD desktops. All have pny nvidia quadro 410's. > These have been mostly working well for about 6 years. > > For months I've started seeing screen corruption when using chrome or > kicad; firefox and thunderbird are always ok. But just starting eeschema > always damages the root window a little. And it's common when running > chrome/kicad to see lines in the console xterm window jump up and down > two lines. But for the last week or two Xorg has been crashing: > > [ 74574.029] (EE) Backtrace: > [ 74574.032] (EE) 0: /usr/local/bin/Xorg (?+0x0) [0x41c98a] > [ 74574.033] (EE) unw_get_proc_name failed: no unwind info found [-10] > [ 74574.033] (EE) 1: /lib/libthr.so.3 (?+0x0) [0x800929b7e] > [ 74574.035] (EE) unw_get_proc_name failed: no unwind info found [-10] > [ 74574.035] (EE) 2: /lib/libthr.so.3 (?+0x0) [0x80092913f] > [ 74574.037] (EE) 3: ? (?+0x0) [0x7ffffffff003] > [ 74574.038] (EE) 4: > /usr/local/lib/xorg/modules/drivers/nvidia_drv.so (?+0x0) [0x801cc8c20] > [ 74574.038] (EE) > [ 74574.038] (EE) Segmentation fault at address 0x50 > [ 74574.038] (EE) > Fatal server error: > [ 74574.038] (EE) Caught signal 11 (Segmentation fault). Server > aborting > > The crashes are always preceded by at least one nvidia "Xid" kernel message: > > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb, > ErrorCode 00000004 > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb, > ErrorCode 00000004 > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data ffffffb9, > ErrorCode 00000004 > Jun 23 ... kernel: : pid 6327 (Xorg), jid 0, uid 0: exited on signal 6 > > Worth noting is that it was not unusual to see many Xid ErrorCode 4 > kernel messages without crashes. (And it's the only ErrorCode I've ever > seen.) > > My first thought was bad nvidia-driver version. But after working my > way, one by one, down to 460.39 (circa February 2021 -- months before > the first crashes) I gave up on that theory. > > My next guess bad hardware but I swapped quadro's between two systems > and the crashes persisted. > > Yesterday Xorg crashed often enough for me to zero on the trigger; it's > the use of tvtwm's f.forcemove action (which is like f.move but allows > moving a windows off the screen) if I move a window slightly off the > bottom of the screen. Here's the .twmrc binding I use: > > Button2 = m s : window : f.forcemove > > The crash doesn't happen 100% of the time but it's pretty easy to > trigger with half a dozen windows open. Just grab a window and randomly > dip part of it past the bottom of the screen. So my new theory is a > frame buffer operation in one of the libraries the path between Xorg and > the nvidia driver has regressed and is asking the nvidia driver to do > something that causes it to do something bad. > > I run a custom version of tvtwm but was able to easily crash Xorg using > x11-wm/twm on a spare quadro 410 workstation; the key is f.forcemove. > > Does anybody know what this issue is? What are likely candidates of > recently changed port libraries that I could try downgrading? Should I > try opening a ticket with nvidia? Should I try even older 460.XX > drivers? What else can I try? (Thanks for reading this far!) Long shot, but libglvnd update affected x11/nvidia-driver. Have a look at UPDATING 20210617 HTH > > Craig >