Date: Fri, 12 Jan 2001 11:13:29 +0100 From: Xavier Galleri <xgalleri@wanadoo.fr> To: Alfred Perlstein <bright@wintelcom.net> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Need help for kernel crash dump analysis Message-ID: <3A5ED8C9.3050309@wanadoo.fr> References: <20010111163903.E6FF737B400@hub.freebsd.org> <3A5DE59F.6060602@enition.com> <3A5E090B.40601@enition.com> <20010111114318.C7240@fw.wintelcom.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--------------000601080407040901070709 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Thank you for your answer, OK, let's make it a bit clearer ! I use a private scheme to interact with the 'ipintr' isr. The two following routines are expected to be called either by our modified version of 'ip_input' at network SWI level or at user level. int my_global_ipl=0; void my_enter() { int s=splnet(); /* We do not expect this routine to be reentrant, thus the following sanity check. */ ASSERT(my_global_ipl==0); my_global_ipl=s; } void my_exit() { int s=my_global_ipl; my_global_ipl=0; splx(s); } The crashes I got are always due to the assertion failure occuring in the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network SWI level after another execution flow has called 'my_enter' itself and has *NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet', and the only explanation I have found is that the first execution flow has fallen asleep somewhere in the kernel (while this is not expected, of course !). Now, if you've read my first mail, I was actually asking for help onhow to dump the stack of an interrupted process with GDB when the kernelcrash occurs in the context of an isr. Actually, I would like to know how I could dump the stack of *any* process at the time of the crash. This way, I would be able to see where my user-land daemon was lying in the kernel when the interrupt occurs. Anyway, without this information, I am reduced to add some traps on the track of the execution of my process within my kernel code. This brought me to surround calls to MALLOC with counters as follows: somewhere_else() { ... my_enter(); /* handle competition with network isr (especially ipintr) */ ... some_counter++; MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT); some_other_counter++; ... my_exit(); ... } Then, all crashes I got show the following equation at the time of crash: ( some_counter - some_other_counter == 1 ) which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC. My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that : * system-scope scheduling is still done at process level (no kernel thread yet) * any process executing in the kernel cannot be preempted for execution by another process unless it either returns to user code or falls explicitely asleep. * the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set. Is that correct ? Well, I am obviously tracking a bug in my own code, but I would greatly appreciate to get help either on my GDB usage question or through technical hints on where I should look at to progress in my investigation. Thank you very much for your attention, Rgds, Xavier Alfred Perlstein wrote: > * Xavier Galleri <xgalleri@enition.com> [010111 11:27] wrote: > >> Hi everybody, >> >> I have reached a point where I am wondering if a call to 'malloc' with >> the M_NOWAIT flag is not falling asleep ! > > > M_NOWAIT shouldn't sleep. > >> In fact, I suspect that the interrupted context is somewhere during a >> call to 'malloc' (I increment a counter just before calling malloc and >> increment another just after and the difference is one !) while I have >> called 'splnet' beforehand, thus normally preventing competing with any >> network isr. I assume that this shouldnever occur unless the code is >> somewhere calling 'sleep' and provoke acontext switch. > > > if you add 1 to a variable the difference is expected to be one. > >> Is there anybody who can help on this ? > > > I'm not sure, you need to be more specific/clear. > --------------000601080407040901070709 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html><head></head><body>Thank you for your answer,<br> <br> OK, let's make it a bit clearer !<br> <br> I use a private scheme to interact with the 'ipintr' isr. The two following routines are expected to be called either by our modified version of 'ip_input' at network SWI level or at user level.<br> <br> int my_global_ipl=0;<br> void my_enter() {<br> int s=splnet();<br> /* We do not expect this routine to be reentrant, thus the following sanity check. */<br> ASSERT(my_global_ipl==0);<br> my_global_ipl=s;<br> }<br> void my_exit() {<br> int s=my_global_ipl;<br> my_global_ipl=0;<br> splx(s);<br> }<br> <br> The crashes I got are always due to the assertion failure occuring in the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network SWI level after another execution flow has called 'my_enter' itself and has *NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet', and the only explanation I have found is that the first execution flow has fallen asleep somewhere in the kernel (while this is not expected, of course !).<br> <br> Now, if you've read my first mail, I was actually asking for help onhow to dump the stack of an interrupted process with GDB when the kernelcrash occurs in the context of an isr. Actually, I would like to know how I could dump the stack of *any* process at the time of the crash. This way, I would be able to see where my user-land daemon was lying in the kernel when the interrupt occurs.<br> <br> Anyway, without this information, I am reduced to add some traps on the track of the execution of my process within my kernel code. This brought me to surround calls to MALLOC with counters as follows:<br> <br> somewhere_else() {<br> ...<br> my_enter(); /* handle competition with network isr (especially ipintr) */<br> ...<br> some_counter++;<br> MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT);<br> some_other_counter++;<br> ...<br> my_exit();<br> ...<br> }<br> <br> Then, all crashes I got show the following equation at the time of crash:<br> <blockquote> ( some_counter - some_other_counter == 1 )</blockquote> which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC.<br> <br> My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that :<br> <ul> <li>system-scope scheduling is still done at process level (no kernel thread yet)</li> <li>any process executing in the kernel cannot be preempted for execution by another process unless it either returns to user code or falls explicitely asleep.</li> <li>the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set.<br> </li> </ul> Is that correct ?<br> <br> Well, I am obviously tracking a bug in my own code, but I would greatly appreciate to get help either on my GDB usage question or through technical hints on where I should look at to progress in my investigation.<br> <br> Thank you very much for your attention,<br> <br> Rgds,<br> <br> Xavier<br> <br> Alfred Perlstein wrote:<br> <blockquote type="cite" cite="mid:20010111114318.C7240@fw.wintelcom.net"><pre wrap="">* Xavier Galleri <a class="moz-txt-link-rfc2396E" href="mailto:xgalleri@enition.com"><xgalleri@enition.com></a> [010111 11:27] wrote:<br></pre> <blockquote type="cite"><pre wrap="">Hi everybody,<br><br>I have reached a point where I am wondering if a call to 'malloc' with <br>the M_NOWAIT flag is not falling asleep !<br></pre></blockquote> <pre wrap=""><!----><br>M_NOWAIT shouldn't sleep.<br><br></pre> <blockquote type="cite"><pre wrap="">In fact, I suspect that the interrupted context is somewhere during a <br>call to 'malloc' (I increment a counter just before calling malloc and <br>increment another just after and the difference is one !) while I have <br>called 'splnet' beforehand, thus normally preventing competing with any <br>network isr. I assume that this shouldnever occur unless the code is <br>somewhere calling 'sleep' and provoke acontext switch.<br></pre></blockquote> <pre wrap=""><!----><br>if you add 1 to a variable the difference is expected to be one.<br><br></pre> <blockquote type="cite"><pre wrap="">Is there anybody who can help on this ?<br></pre></blockquote> <pre wrap=""><!----><br>I'm not sure, you need to be more specific/clear.<br><br></pre> </blockquote> <br> </body></html> --------------000601080407040901070709-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A5ED8C9.3050309>