Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Sep 2018 12:40:44 -0300
From:      "Dr. Rolf Jansen" <rj@obsigna.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Kernel Panic on BBB cause by ti_adc intr
Message-ID:  <09B4DAE6-4021-4D77-8D74-6E112EE5E9E8@obsigna.com>
In-Reply-To: <1535900968.9486.5.camel@freebsd.org>
References:  <B259CA27-7D08-45B1-97BB-35A544E346BB@obsigna.com> <1535900968.9486.5.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> Am 02.09.2018 um 12:09 schrieb Ian Lepore <ian@freebsd.org>:
>=20
> On Sun, 2018-09-02 at 00:15 -0300, Dr. Rolf Jansen wrote:
>> I got signal sources connected to AIN0 and AIN1 of the BBB. The
>> signals are divided, clipped and clamped and are guaranteed to stay
>> in the range of 0 to 1.8 V. Generally, the circuitry does work and
>> the ADC readings match very well the expectations.
>>=20
>> Only, sometimes, usually when I power on some considerable load (e.g.
>> a hair dryer) connected to a different AC plug, but in the same room,
>> the BBB bails out, giving the stack backtrace shown below. It might
>> well be, that a power-on spike traverses the AC electricity supply,
>> but there is no way that the spike after clipping and clamping would
>> exceed said limits.
>>=20
>> My understanding of the stack backtrace is, that somehow an interrupt
>> is triggered by said spike, and then it hits a bug in the interrupt
>> handler. It seems that an exclusive sleep mutex is locked when it is
>> not expected to be. This happened on FreeBSD 12.0-ALPHA3 and today
>> also on -ALPHA4.
>>=20
>> Question:
>>=20
>>    I don't need interrupt handling in my project, since the signal
>>    changes are slow, and the changes need to be read in defined
>>    time intervals. So, is it possible to deactivate the interrupt
>>    handler of the ti_adc?
>>=20
>> Presumably then the feature of the exclusive sleep mutex on ti_adc0
>> would not be challenged and therefore may continue sleeping forever.
>> Of course, I want continue being able of timed reading of the ADC
>> values.
>>=20
>> Any suggestions would be greatly appreciated, since a BBB which can
>> be DoS'ed by powering on a hair dryer is not as useful as it could
>> be.
>>=20
>> Best regards
>>=20
>> Rolf
>>=20
>>=20
>> Kernel page fault with the following non-sleepable locks held:
>> exclusive sleep mutex ti_adc0 (ti_adc) r =3D 0 (0xc2277d08) locked @
>> /usr/src/sys/arm/ti/ti_adc.c:508
>> stack backtrace:
>> Fatal kernel mode data abort: 'Translation Fault (L1)' on read
>> trapframe: 0xd2ebeca0
>> FSR=3D00000005, FAR=3D00000128, spsr=3D20000013
>> r0 =3D00000000, r1 =3D00000003, r2 =3D00000001, r3 =3D00000000
>> r4 =3D00000000, r5 =3D00000000, r6 =3D00000003, r7 =3D00000016
>> r8 =3D00000000, r9 =3Dc2280e00, r10=3D00000021, r11=3Dd2ebed60
>> r12=3Dc0ace03c, ssp=3Dd2ebed30, slr=3Dc067d61c, pc =3Dc00888c0
>>=20
>> panic: Fatal abort
>> cpuid =3D 0
>> time =3D 1535844155
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> 	 pc =3D 0xc05c7484  lr =3D 0xc0075d04 =
(db_trace_self_wrapper+0x30)
>> 	 sp =3D 0xd2ebea80  fp =3D 0xd2ebeb98
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> 	 pc =3D 0xc0075d04  lr =3D 0xc029d60c (vpanic+0x16c)
>> 	 sp =3D 0xd2ebeba0  fp =3D 0xd2ebebc0
>> 	 r4 =3D 0x00000100  r5 =3D 0x00000001
>> 	 r6 =3D 0xc071bb22  r7 =3D 0xc0a8cfd8
>> vpanic() at vpanic+0x16c
>> 	 pc =3D 0xc029d60c  lr =3D 0xc029d3ec (doadump)
>> 	 sp =3D 0xd2ebebc8  fp =3D 0xd2ebebcc
>> 	 r4 =3D 0xd2ebeca0  r5 =3D 0x00000013
>> 	 r6 =3D 0x00000128  r7 =3D 0x00000005
>> 	 r8 =3D 0x00000005  r9 =3D 0xd2ebeca0
>> 	r10 =3D 0x00000128
>> doadump() at doadump
>> 	 pc =3D 0xc029d3ec  lr =3D 0xc05e9bb0 (abort_align)
>> 	 sp =3D 0xd2ebebd4  fp =3D 0xd2ebec00
>> 	 r4 =3D 0xc029d3ec  r5 =3D 0xd2ebebd4
>> abort_align() at abort_align
>> 	 pc =3D 0xc05e9bb0  lr =3D 0xc05e9740 (abort_handler+0x2e0)
>> 	 sp =3D 0xd2ebec08  fp =3D 0xd2ebec98
>> 	 r4 =3D 0x00000013  r5 =3D 0x00000128
>> abort_handler() at abort_handler+0x2e0
>> 	 pc =3D 0xc05e9740  lr =3D 0xc05c9dd4 (exception_exit)
>> 	 sp =3D 0xd2ebeca0  fp =3D 0xd2ebed60
>> 	 r4 =3D 0x00000000  r5 =3D 0x00000000
>> 	 r6 =3D 0x00000003  r7 =3D 0x00000016
>> 	 r8 =3D 0x00000000  r9 =3D 0xc2280e00
>> 	r10 =3D 0x00000021
>> exception_exit() at exception_exit
>> 	 pc =3D 0xc05c9dd4  lr =3D 0xc067d61c (ti_adc_intr+0x88)
>> 	 sp =3D 0xd2ebed30  fp =3D 0xd2ebed60
>> 	 r0 =3D 0x00000000  r1 =3D 0x00000003
>> 	 r2 =3D 0x00000001  r3 =3D 0x00000000
>> 	 r4 =3D 0x00000000  r5 =3D 0x00000000
>> 	 r6 =3D 0x00000003  r7 =3D 0x00000016
>> 	 r8 =3D 0x00000000  r9 =3D 0xc2280e00
>> 	r10 =3D 0x00000021 r12 =3D 0xc0ace03c
>> evdev_push_event() at evdev_push_event+0x4c
>> 	 pc =3D 0xc00888c0  lr =3D 0xc067d61c (ti_adc_intr+0x88)
>> 	 sp =3D 0xd2ebed68  fp =3D 0xd2ebedd0
>> 	 r4 =3D 0xd2fce800  r5 =3D 0xc2277d00
>> 	 r6 =3D 0x00000000  r7 =3D 0x00000421
>> 	 r8 =3D 0xc2277d18  r9 =3D 0xc2280e00
>> ti_adc_intr() at ti_adc_intr+0x88
>> 	 pc =3D 0xc067d61c  lr =3D 0xc02662fc (ithread_loop+0x1f0)
>> 	 sp =3D 0xd2ebedd8  fp =3D 0xd2ebee20
>> 	 r4 =3D 0xd2fce800  r5 =3D 0x00000000
>> 	 r6 =3D 0xd2fce844  r7 =3D 0x00000000
>> 	 r8 =3D 0xc0719541  r9 =3D 0xc2280e00
>> 	r10 =3D 0x00000000
>> ithread_loop() at ithread_loop+0x1f0
>> 	 pc =3D 0xc02662fc  lr =3D 0xc0262ef8 (fork_exit+0xa0)
>=20
> That's a strange exception stack, with lots of registers containing
> zeroes at exception time that were non-zero in the prior stack frame.
> It makes me think something has overwritten the stack with garbage
> data. When I look at ti_adc_tsc_read_data() it has a stack-allocated
> data array with 16 elements, and a loop that could load more than 16
> elements into that array (ADC_FIFO_COUNT_MSK is 0x7f), that seems like
> trouble.
>=20
> You said you don't need interrupts, does that mean you're reading the
> values via sysctl and aren't using the EVDEV stuff? If so, you might =
be
> able to quickly work around the panic by building a custom kernel =
using
> 'nooption EVDEV_SUPPORT'.

I forgot to mention, that at the time of the panic, =
dev.ti_adc.0.ain.0.enable and dev.ti_adc.0.ain.1.enable were not set to =
1 (enabled) yet, and were not expected to read anything.

Yes, I only need the values in defined time intervals and I poll the ADC =
readings with the sysctlbyname() function.

I compared an (arbitrarily) old version of ti_adc_intr(void *arg) in =
ti_adc.c with the current one. The infinging call happens on line 508, =
and it is TI_ADC_LOCK(sc);. The striking difference between the old and =
the new code is that in the latter one TI_ADC_LOCK(sc); is called =
unconditionally, while in the old one the following check happens before =
TI_ADC_LOCK(sc); may be get called:

ti_adc_intr(void *arg) from 2014:

	status =3D ADC_READ4(sc, ADC_IRQSTATUS);
	if (status =3D=3D 0)
		return;

I started to set up a cross building environment on a fast i7 box. My =
plan is to place above check into the said function. If this doesn't =
help, I will rebuild the kernel with 'nooption EVDEV_SUPPORT'. Thank you =
for pointing me into that direction. I even don't know what EVDEV is =
good for.

Best regards

Rolf=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?09B4DAE6-4021-4D77-8D74-6E112EE5E9E8>