From owner-freebsd-hackers@freebsd.org Sun Dec 4 19:13:44 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89CD5C671C9 for ; Sun, 4 Dec 2016 19:13:44 +0000 (UTC) (envelope-from gonzo@id.bluezbox.com) Received: from id.bluezbox.com (id.bluezbox.com [45.55.20.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B10D10D for ; Sun, 4 Dec 2016 19:13:44 +0000 (UTC) (envelope-from gonzo@id.bluezbox.com) Received: from [136.179.10.143] (helo=[10.140.230.85]) by id.bluezbox.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.87 (FreeBSD)) (envelope-from ) id 1cDcED-0009fM-Ua; Sun, 04 Dec 2016 11:13:38 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: Please help me understand "Translation Fault" in custom device drivers, and how to debug From: Oleksandr Tymoshenko In-Reply-To: Date: Sun, 4 Dec 2016 11:13:06 -0800 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <85666618-B6A5-4577-86B9-914DEDE84ACD@bluezbox.com> References: To: Lee D X-Mailer: Apple Mail (2.3251) Sender: gonzo@id.bluezbox.com X-Spam-Level: -- X-Spam-Report: Spam detection software, running on the system "id.bluezbox.com", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see The administrator of that system for details. Content preview: > On Dec 4, 2016, at 10:32 AM, Lee D wrote: > > Hello, > > I need help understanding what a translation fault is, and how to debug > it. I have googled like crazy but can't seem to find any detailed > information. > > I am working on an embedded system using an ARM processor, and consequently > am writing a bunch of device device drivers for my custom hardware. > > I am having a problem with occasional crashes when kldload'ing my modules > in a boot script. I get various errors, including "Translation Fault" (L1 > or L2), "Alignment Fault", "vm_fault", and "undefined instruction in > kernel". My code works 95% of the time though. > > I never see any crashes while running, so I don't think this is a flaky > hardware problem. > > Any suggestions on what kernel debugger commands to enter to gather > information would also be helpful. Here are the commands I am currently > recording the output of when I get a crash: > > db> bt > db> ps > db> show intr > db> show proc 618 > db> show allpcpu > db> show allrman > db> show intrcnt > db> show proc > db> show procvm > > For a single concrete example, here is a backtrace of a device driver that > failed with a translation fault on kldload. This BT is unique in that it > actually seems to contain useful information. Most of the backtraces just > show some abort/exeception related calls and then say "Unable to unwind > into user space" (paraphrased), leaving me no info about where my crash > happened. > > FreeBSD 10.3 [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Dec 2016 19:13:44 -0000 > On Dec 4, 2016, at 10:32 AM, Lee D wrote: >=20 > Hello, >=20 > I need help understanding what a translation fault is, and how to = debug > it. I have googled like crazy but can't seem to find any detailed > information. >=20 > I am working on an embedded system using an ARM processor, and = consequently > am writing a bunch of device device drivers for my custom hardware. >=20 > I am having a problem with occasional crashes when kldload'ing my = modules > in a boot script. I get various errors, including "Translation Fault" = (L1 > or L2), "Alignment Fault", "vm_fault", and "undefined instruction in > kernel". My code works 95% of the time though. >=20 > I never see any crashes while running, so I don't think this is a = flaky > hardware problem. >=20 > Any suggestions on what kernel debugger commands to enter to gather > information would also be helpful. Here are the commands I am = currently > recording the output of when I get a crash: >=20 > db> bt > db> ps > db> show intr > db> show proc 618 > db> show allpcpu > db> show allrman > db> show intrcnt > db> show proc > db> show procvm >=20 > For a single concrete example, here is a backtrace of a device driver = that > failed with a translation fault on kldload. This BT is unique in that = it > actually seems to contain useful information. Most of the backtraces = just > show some abort/exeception related calls and then say "Unable to = unwind > into user space" (paraphrased), leaving me no info about where my = crash > happened. >=20 > FreeBSD 10.3 Hi Lee, Random crashes during kldload sounds like missing or incomplete icache sync to me. You can take a look at icache-realted fixes in HEAD=E2=80=99s = sys/arm and try to backport them to 10.3.=20=