Date: Sun, 2 Sep 2018 12:40:44 -0300 From: "Dr. Rolf Jansen" <rj@obsigna.com> To: Ian Lepore <ian@freebsd.org> Cc: freebsd-arm@freebsd.org Subject: Re: Kernel Panic on BBB cause by ti_adc intr Message-ID: <09B4DAE6-4021-4D77-8D74-6E112EE5E9E8@obsigna.com> In-Reply-To: <1535900968.9486.5.camel@freebsd.org> References: <B259CA27-7D08-45B1-97BB-35A544E346BB@obsigna.com> <1535900968.9486.5.camel@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> Am 02.09.2018 um 12:09 schrieb Ian Lepore <ian@freebsd.org>: >=20 > On Sun, 2018-09-02 at 00:15 -0300, Dr. Rolf Jansen wrote: >> I got signal sources connected to AIN0 and AIN1 of the BBB. The >> signals are divided, clipped and clamped and are guaranteed to stay >> in the range of 0 to 1.8 V. Generally, the circuitry does work and >> the ADC readings match very well the expectations. >>=20 >> Only, sometimes, usually when I power on some considerable load (e.g. >> a hair dryer) connected to a different AC plug, but in the same room, >> the BBB bails out, giving the stack backtrace shown below. It might >> well be, that a power-on spike traverses the AC electricity supply, >> but there is no way that the spike after clipping and clamping would >> exceed said limits. >>=20 >> My understanding of the stack backtrace is, that somehow an interrupt >> is triggered by said spike, and then it hits a bug in the interrupt >> handler. It seems that an exclusive sleep mutex is locked when it is >> not expected to be. This happened on FreeBSD 12.0-ALPHA3 and today >> also on -ALPHA4. >>=20 >> Question: >>=20 >> I don't need interrupt handling in my project, since the signal >> changes are slow, and the changes need to be read in defined >> time intervals. So, is it possible to deactivate the interrupt >> handler of the ti_adc? >>=20 >> Presumably then the feature of the exclusive sleep mutex on ti_adc0 >> would not be challenged and therefore may continue sleeping forever. >> Of course, I want continue being able of timed reading of the ADC >> values. >>=20 >> Any suggestions would be greatly appreciated, since a BBB which can >> be DoS'ed by powering on a hair dryer is not as useful as it could >> be. >>=20 >> Best regards >>=20 >> Rolf >>=20 >>=20 >> Kernel page fault with the following non-sleepable locks held: >> exclusive sleep mutex ti_adc0 (ti_adc) r =3D 0 (0xc2277d08) locked @ >> /usr/src/sys/arm/ti/ti_adc.c:508 >> stack backtrace: >> Fatal kernel mode data abort: 'Translation Fault (L1)' on read >> trapframe: 0xd2ebeca0 >> FSR=3D00000005, FAR=3D00000128, spsr=3D20000013 >> r0 =3D00000000, r1 =3D00000003, r2 =3D00000001, r3 =3D00000000 >> r4 =3D00000000, r5 =3D00000000, r6 =3D00000003, r7 =3D00000016 >> r8 =3D00000000, r9 =3Dc2280e00, r10=3D00000021, r11=3Dd2ebed60 >> r12=3Dc0ace03c, ssp=3Dd2ebed30, slr=3Dc067d61c, pc =3Dc00888c0 >>=20 >> panic: Fatal abort >> cpuid =3D 0 >> time =3D 1535844155 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> pc =3D 0xc05c7484 lr =3D 0xc0075d04 = (db_trace_self_wrapper+0x30) >> sp =3D 0xd2ebea80 fp =3D 0xd2ebeb98 >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> pc =3D 0xc0075d04 lr =3D 0xc029d60c (vpanic+0x16c) >> sp =3D 0xd2ebeba0 fp =3D 0xd2ebebc0 >> r4 =3D 0x00000100 r5 =3D 0x00000001 >> r6 =3D 0xc071bb22 r7 =3D 0xc0a8cfd8 >> vpanic() at vpanic+0x16c >> pc =3D 0xc029d60c lr =3D 0xc029d3ec (doadump) >> sp =3D 0xd2ebebc8 fp =3D 0xd2ebebcc >> r4 =3D 0xd2ebeca0 r5 =3D 0x00000013 >> r6 =3D 0x00000128 r7 =3D 0x00000005 >> r8 =3D 0x00000005 r9 =3D 0xd2ebeca0 >> r10 =3D 0x00000128 >> doadump() at doadump >> pc =3D 0xc029d3ec lr =3D 0xc05e9bb0 (abort_align) >> sp =3D 0xd2ebebd4 fp =3D 0xd2ebec00 >> r4 =3D 0xc029d3ec r5 =3D 0xd2ebebd4 >> abort_align() at abort_align >> pc =3D 0xc05e9bb0 lr =3D 0xc05e9740 (abort_handler+0x2e0) >> sp =3D 0xd2ebec08 fp =3D 0xd2ebec98 >> r4 =3D 0x00000013 r5 =3D 0x00000128 >> abort_handler() at abort_handler+0x2e0 >> pc =3D 0xc05e9740 lr =3D 0xc05c9dd4 (exception_exit) >> sp =3D 0xd2ebeca0 fp =3D 0xd2ebed60 >> r4 =3D 0x00000000 r5 =3D 0x00000000 >> r6 =3D 0x00000003 r7 =3D 0x00000016 >> r8 =3D 0x00000000 r9 =3D 0xc2280e00 >> r10 =3D 0x00000021 >> exception_exit() at exception_exit >> pc =3D 0xc05c9dd4 lr =3D 0xc067d61c (ti_adc_intr+0x88) >> sp =3D 0xd2ebed30 fp =3D 0xd2ebed60 >> r0 =3D 0x00000000 r1 =3D 0x00000003 >> r2 =3D 0x00000001 r3 =3D 0x00000000 >> r4 =3D 0x00000000 r5 =3D 0x00000000 >> r6 =3D 0x00000003 r7 =3D 0x00000016 >> r8 =3D 0x00000000 r9 =3D 0xc2280e00 >> r10 =3D 0x00000021 r12 =3D 0xc0ace03c >> evdev_push_event() at evdev_push_event+0x4c >> pc =3D 0xc00888c0 lr =3D 0xc067d61c (ti_adc_intr+0x88) >> sp =3D 0xd2ebed68 fp =3D 0xd2ebedd0 >> r4 =3D 0xd2fce800 r5 =3D 0xc2277d00 >> r6 =3D 0x00000000 r7 =3D 0x00000421 >> r8 =3D 0xc2277d18 r9 =3D 0xc2280e00 >> ti_adc_intr() at ti_adc_intr+0x88 >> pc =3D 0xc067d61c lr =3D 0xc02662fc (ithread_loop+0x1f0) >> sp =3D 0xd2ebedd8 fp =3D 0xd2ebee20 >> r4 =3D 0xd2fce800 r5 =3D 0x00000000 >> r6 =3D 0xd2fce844 r7 =3D 0x00000000 >> r8 =3D 0xc0719541 r9 =3D 0xc2280e00 >> r10 =3D 0x00000000 >> ithread_loop() at ithread_loop+0x1f0 >> pc =3D 0xc02662fc lr =3D 0xc0262ef8 (fork_exit+0xa0) >=20 > That's a strange exception stack, with lots of registers containing > zeroes at exception time that were non-zero in the prior stack frame. > It makes me think something has overwritten the stack with garbage > data. When I look at ti_adc_tsc_read_data() it has a stack-allocated > data array with 16 elements, and a loop that could load more than 16 > elements into that array (ADC_FIFO_COUNT_MSK is 0x7f), that seems like > trouble. >=20 > You said you don't need interrupts, does that mean you're reading the > values via sysctl and aren't using the EVDEV stuff? If so, you might = be > able to quickly work around the panic by building a custom kernel = using > 'nooption EVDEV_SUPPORT'. I forgot to mention, that at the time of the panic, = dev.ti_adc.0.ain.0.enable and dev.ti_adc.0.ain.1.enable were not set to = 1 (enabled) yet, and were not expected to read anything. Yes, I only need the values in defined time intervals and I poll the ADC = readings with the sysctlbyname() function. I compared an (arbitrarily) old version of ti_adc_intr(void *arg) in = ti_adc.c with the current one. The infinging call happens on line 508, = and it is TI_ADC_LOCK(sc);. The striking difference between the old and = the new code is that in the latter one TI_ADC_LOCK(sc); is called = unconditionally, while in the old one the following check happens before = TI_ADC_LOCK(sc); may be get called: ti_adc_intr(void *arg) from 2014: status =3D ADC_READ4(sc, ADC_IRQSTATUS); if (status =3D=3D 0) return; I started to set up a cross building environment on a fast i7 box. My = plan is to place above check into the said function. If this doesn't = help, I will rebuild the kernel with 'nooption EVDEV_SUPPORT'. Thank you = for pointing me into that direction. I even don't know what EVDEV is = good for. Best regards Rolf=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?09B4DAE6-4021-4D77-8D74-6E112EE5E9E8>