Date: Fri, 12 Jan 2001 11:14:59 +0100 From: Xavier Galleri <xgalleri@enition.com> To: Alfred Perlstein <bright@wintelcom.net> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Need help for kernel crash dump analysis Message-ID: <3A5ED923.3010207@enition.com> References: <20010111163903.E6FF737B400@hub.freebsd.org> <3A5DE59F.6060602@enition.com> <3A5E090B.40601@enition.com> <20010111114318.C7240@fw.wintelcom.net>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
Thank you for your answer,
OK, let's make it a bit clearer !
I use a private scheme to interact with the 'ipintr' isr. The two
following routines are expected to be called either by our modified
version of 'ip_input' at network SWI level or at user level.
int my_global_ipl=0;
void my_enter() {
int s=splnet();
/* We do not expect this routine to be reentrant, thus the following
sanity check. */
ASSERT(my_global_ipl==0);
my_global_ipl=s;
}
void my_exit() {
int s=my_global_ipl;
my_global_ipl=0;
splx(s);
}
The crashes I got are always due to the assertion failure occuring in
the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at
the network SWI level after another execution flow has called 'my_enter'
itself and has *NOT* called 'my_exit' yet ! This actually seems strange
due to the 'splnet', and the only explanation I have found is that the
first execution flow has fallen asleep somewhere in the kernel (while
this is not expected, of course !).
Now, if you've read my first mail, I was actually asking for help onhow
to dump the stack of an interrupted process with GDB when the
kernelcrash occurs in the context of an isr. Actually, I would like to
know how I could dump the stack of *any* process at the time of the
crash. This way, I would be able to see where my user-land daemon was
lying in the kernel when the interrupt occurs.
Anyway, without this information, I am reduced to add some traps on the
track of the execution of my process within my kernel code. This brought
me to surround calls to MALLOC with counters as follows:
somewhere_else() {
...
my_enter(); /* handle competition with network isr (especially
ipintr) */
...
some_counter++;
MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT);
some_other_counter++;
...
my_exit();
...
}
Then, all crashes I got show the following equation at the time of crash:
( some_counter - some_other_counter == 1 )
which *seems* to indicate that that my process has been somehow
preempted during the call to MALLOC.
My belief is that the FreeBSD kernel is (currently) a monolithic
non-preemptive non-threaded UNIX kernel, thus implying that :
* system-scope scheduling is still done at process level (no kernel
thread yet)
* any process executing in the kernel cannot be preempted for
execution by another process unless it either returns to user code
or falls explicitely asleep.
* the only interlocking that must be done is with interrupts (when
relevant), using the 'spl' management routine set.
Is that correct ?
Well, I am obviously tracking a bug in my own code, but I would greatly
appreciate to get help either on my GDB usage question or through
technical hints on where I should look at to progress in my investigation.
Thank you very much for your attention,
Rgds,
Xavier
Alfred Perlstein wrote:
> * Xavier Galleri <xgalleri@enition.com> <mailto:xgalleri@enition.com> [010111 11:27] wrote:
>
>> Hi everybody,
>>
>> I have reached a point where I am wondering if a call to 'malloc' with
>> the M_NOWAIT flag is not falling asleep !
>
>
> M_NOWAIT shouldn't sleep.
>
>> In fact, I suspect that the interrupted context is somewhere during a
>> call to 'malloc' (I increment a counter just before calling malloc and
>> increment another just after and the difference is one !) while I have
>> called 'splnet' beforehand, thus normally preventing competing with any
>> network isr. I assume that this shouldnever occur unless the code is
>> somewhere calling 'sleep' and provoke acontext switch.
>
>
> if you add 1 to a variable the difference is expected to be one.
>
>> Is there anybody who can help on this ?
>
>
> I'm not sure, you need to be more specific/clear.
>
[-- Attachment #2 --]
<html><head></head><body>Thank you for your answer,<br>
<br>
OK, let's make it a bit clearer !<br>
<br>
I use a private scheme to interact with the 'ipintr' isr. The two following
routines are expected to be called either by our modified version of 'ip_input'
at network SWI level or at user level.<br>
<br>
int my_global_ipl=0;<br>
void my_enter() {<br>
int s=splnet();<br>
/* We do not expect this routine to be reentrant, thus the following sanity check. */<br>
ASSERT(my_global_ipl==0);<br>
my_global_ipl=s;<br>
}<br>
void my_exit() {<br>
int s=my_global_ipl;<br>
my_global_ipl=0;<br>
splx(s);<br>
}<br>
<br>
The crashes I got are always due to the assertion failure occuring in the
'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network
SWI level after another execution flow has called 'my_enter' itself and has
*NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet',
and the only explanation I have found is that the first execution flow has
fallen asleep somewhere in the kernel (while this is not expected, of course
!).<br>
<br>
Now, if you've read my first mail, I was actually asking for help onhow to
dump the stack of an interrupted process with GDB when the kernelcrash occurs
in the context of an isr. Actually, I would like to know how I could dump
the stack of *any* process at the time of the crash. This way, I would be
able to see where my user-land daemon was lying in the kernel when the interrupt
occurs.<br>
<br>
Anyway, without this information, I am reduced to add some traps on the track
of the execution of my process within my kernel code. This brought me to
surround calls to MALLOC with counters as follows:<br>
<br>
somewhere_else() {<br>
...<br>
my_enter(); /* handle competition with network isr (especially ipintr) */<br>
...<br>
some_counter++;<br>
MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT);<br>
some_other_counter++;<br>
...<br>
my_exit();<br>
...<br>
}<br>
<br>
Then, all crashes I got show the following equation at the time of crash:<br>
<blockquote> ( some_counter - some_other_counter == 1 )</blockquote>
which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC.<br>
<br>
My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that :<br>
<ul>
<li>system-scope scheduling is still done at process level (no kernel thread yet)</li><li>any process executing in the kernel cannot be preempted for execution
by another process unless it either returns to user code or falls explicitely
asleep.</li><li>the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set.<br></li>
</ul>
Is that correct ?<br>
<br>
Well, I am obviously tracking a bug in my own code, but I would greatly appreciate
to get help either on my GDB usage question or through technical hints on
where I should look at to progress in my investigation.<br>
<br>
Thank you very much for your attention,<br>
<br>
Rgds,<br>
<br>
Xavier<br>
<br>
Alfred Perlstein wrote:<br>
<blockquote type="cite" cite="mid:20010111114318.C7240@fw.wintelcom.net"><pre wrap="">* Xavier Galleri <a class="moz-txt-link-rfc2396E" href="mailto:xgalleri@enition.com"><xgalleri@enition.com></a> [010111 11:27] wrote:<br></pre><blockquote type="cite"><pre wrap="">Hi everybody,<br><br>I have reached a point where I am wondering if a call to 'malloc' with <br>the M_NOWAIT flag is not falling asleep !<br></pre></blockquote><pre wrap=""><!----><br>M_NOWAIT shouldn't sleep.<br><br></pre><blockquote type="cite"><pre wrap="">In fact, I suspect that the interrupted context is somewhere during a <br>call to 'malloc' (I increment a counter just before calling malloc and <br>increment another just after and the difference is one !) while I have <br>called 'splnet' beforehand, thus normally preventing competing with any <br>network isr. I assume that this shouldnever occur unless the code is <br>somewhere calling 'sleep' and provoke acontext switch.<br></pre></blockquote><pre wr!
ap=""><!----><br>if you add 1 to a variable the difference is expected to be one.<br><br></pre><blockquote type="cite"><pre wrap="">Is there anybody who can help on this ?<br></pre></blockquote><pre wrap=""><!----><br>I'm not sure, you need to be more specific/clear.<br><br></pre></blockquote>
<br>
<br>
</body></html>
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A5ED923.3010207>
