From owner-freebsd-arm@freebsd.org Sun Sep 2 15:09:40 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64251FF1A7E for ; Sun, 2 Sep 2018 15:09:40 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1a.eu.mailhop.org (outbound1a.eu.mailhop.org [52.58.109.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D1B9B7BEE4 for ; Sun, 2 Sep 2018 15:09:39 +0000 (UTC) (envelope-from ian@freebsd.org) X-MHO-RoutePath: aGlwcGll X-MHO-User: 3021ee76-aec2-11e8-a747-09a40681ccbf X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 67.177.211.60 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [67.177.211.60]) by outbound1.eu.mailhop.org (Halon) with ESMTPSA id 3021ee76-aec2-11e8-a747-09a40681ccbf; Sun, 02 Sep 2018 15:09:30 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id w82F9SbP024747; Sun, 2 Sep 2018 09:09:28 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: <1535900968.9486.5.camel@freebsd.org> Subject: Re: Kernel Panic on BBB cause by ti_adc intr From: Ian Lepore To: "Dr. Rolf Jansen" , freebsd-arm@freebsd.org Date: Sun, 02 Sep 2018 09:09:28 -0600 In-Reply-To: References: Content-Type: text/plain; charset="ISO-8859-1" X-Mailer: Evolution 3.18.5.1 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Sep 2018 15:09:40 -0000 On Sun, 2018-09-02 at 00:15 -0300, Dr. Rolf Jansen wrote: > I got signal sources connected to AIN0 and AIN1 of the BBB. The > signals are divided, clipped and clamped and are guaranteed to stay > in the range of 0 to 1.8 V. Generally, the circuitry does work and > the ADC readings match very well the expectations. > > Only, sometimes, usually when I power on some considerable load (e.g. > a hair dryer) connected to a different AC plug, but in the same room, > the BBB bails out, giving the stack backtrace shown below. It might > well be, that a power-on spike traverses the AC electricity supply, > but there is no way that the spike after clipping and clamping would > exceed said limits. > > My understanding of the stack backtrace is, that somehow an interrupt > is triggered by said spike, and then it hits a bug in the interrupt > handler. It seems that an exclusive sleep mutex is locked when it is > not expected to be. This happened on FreeBSD 12.0-ALPHA3 and today > also on -ALPHA4. > > Question: > >    I don't need interrupt handling in my project, since the signal >    changes are slow, and the changes need to be read in defined >    time intervals. So, is it possible to deactivate the interrupt >    handler of the ti_adc? > > Presumably then the feature of the exclusive sleep mutex on ti_adc0 > would not be challenged and therefore may continue sleeping forever. > Of course, I want continue being able of timed reading of the ADC > values. > > Any suggestions would be greatly appreciated, since a BBB which can > be DoS'ed by powering on a hair dryer is not as useful as it could > be. > > Best regards > > Rolf > > > Kernel page fault with the following non-sleepable locks held: > exclusive sleep mutex ti_adc0 (ti_adc) r = 0 (0xc2277d08) locked @ > /usr/src/sys/arm/ti/ti_adc.c:508 > stack backtrace: > Fatal kernel mode data abort: 'Translation Fault (L1)' on read > trapframe: 0xd2ebeca0 > FSR=00000005, FAR=00000128, spsr=20000013 > r0 =00000000, r1 =00000003, r2 =00000001, r3 =00000000 > r4 =00000000, r5 =00000000, r6 =00000003, r7 =00000016 > r8 =00000000, r9 =c2280e00, r10=00000021, r11=d2ebed60 > r12=c0ace03c, ssp=d2ebed30, slr=c067d61c, pc =c00888c0 > > panic: Fatal abort > cpuid = 0 > time = 1535844155 > KDB: stack backtrace: > db_trace_self() at db_trace_self > pc = 0xc05c7484  lr = 0xc0075d04 (db_trace_self_wrapper+0x30) > sp = 0xd2ebea80  fp = 0xd2ebeb98 > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > pc = 0xc0075d04  lr = 0xc029d60c (vpanic+0x16c) > sp = 0xd2ebeba0  fp = 0xd2ebebc0 > r4 = 0x00000100  r5 = 0x00000001 > r6 = 0xc071bb22  r7 = 0xc0a8cfd8 > vpanic() at vpanic+0x16c > pc = 0xc029d60c  lr = 0xc029d3ec (doadump) > sp = 0xd2ebebc8  fp = 0xd2ebebcc > r4 = 0xd2ebeca0  r5 = 0x00000013 > r6 = 0x00000128  r7 = 0x00000005 > r8 = 0x00000005  r9 = 0xd2ebeca0 > r10 = 0x00000128 > doadump() at doadump > pc = 0xc029d3ec  lr = 0xc05e9bb0 (abort_align) > sp = 0xd2ebebd4  fp = 0xd2ebec00 > r4 = 0xc029d3ec  r5 = 0xd2ebebd4 > abort_align() at abort_align > pc = 0xc05e9bb0  lr = 0xc05e9740 (abort_handler+0x2e0) > sp = 0xd2ebec08  fp = 0xd2ebec98 > r4 = 0x00000013  r5 = 0x00000128 > abort_handler() at abort_handler+0x2e0 > pc = 0xc05e9740  lr = 0xc05c9dd4 (exception_exit) > sp = 0xd2ebeca0  fp = 0xd2ebed60 > r4 = 0x00000000  r5 = 0x00000000 > r6 = 0x00000003  r7 = 0x00000016 > r8 = 0x00000000  r9 = 0xc2280e00 > r10 = 0x00000021 > exception_exit() at exception_exit > pc = 0xc05c9dd4  lr = 0xc067d61c (ti_adc_intr+0x88) > sp = 0xd2ebed30  fp = 0xd2ebed60 > r0 = 0x00000000  r1 = 0x00000003 > r2 = 0x00000001  r3 = 0x00000000 > r4 = 0x00000000  r5 = 0x00000000 > r6 = 0x00000003  r7 = 0x00000016 > r8 = 0x00000000  r9 = 0xc2280e00 > r10 = 0x00000021 r12 = 0xc0ace03c > evdev_push_event() at evdev_push_event+0x4c > pc = 0xc00888c0  lr = 0xc067d61c (ti_adc_intr+0x88) > sp = 0xd2ebed68  fp = 0xd2ebedd0 > r4 = 0xd2fce800  r5 = 0xc2277d00 > r6 = 0x00000000  r7 = 0x00000421 > r8 = 0xc2277d18  r9 = 0xc2280e00 > ti_adc_intr() at ti_adc_intr+0x88 > pc = 0xc067d61c  lr = 0xc02662fc (ithread_loop+0x1f0) > sp = 0xd2ebedd8  fp = 0xd2ebee20 > r4 = 0xd2fce800  r5 = 0x00000000 > r6 = 0xd2fce844  r7 = 0x00000000 > r8 = 0xc0719541  r9 = 0xc2280e00 > r10 = 0x00000000 > ithread_loop() at ithread_loop+0x1f0 > pc = 0xc02662fc  lr = 0xc0262ef8 (fork_exit+0xa0) That's a strange exception stack, with lots of registers containing zeroes at exception time that were non-zero in the prior stack frame. It makes me think something has overwritten the stack with garbage data. When I look at ti_adc_tsc_read_data() it has a stack-allocated data array with 16 elements, and a loop that could load more than 16 elements into that array (ADC_FIFO_COUNT_MSK is 0x7f), that seems like trouble. You said you don't need interrupts, does that mean you're reading the values via sysctl and aren't using the EVDEV stuff? If so, you might be able to quickly work around the panic by building a custom kernel using 'nooption EVDEV_SUPPORT'. -- Ian