From owner-freebsd-hackers Mon Jan 15 8: 6:41 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from smtp.nettoll.com (matrix.nettoll.net [212.155.143.61]) by hub.freebsd.org (Postfix) with ESMTP id A4D2437B402 for ; Mon, 15 Jan 2001 08:06:19 -0800 (PST) Received: by smtp.nettoll.com; Mon, 15 Jan 2001 17:02:43 +0100 (MET) Message-ID: <3A632009.1030604@enition.com> Date: Mon, 15 Jan 2001 17:06:33 +0100 From: Xavier Galleri User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; m18) Gecko/20001108 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.ORG Subject: Re: Need help for kernel crash dump analysis References: <20010111163903.E6FF737B400@hub.freebsd.org> <3A5DE59F.6060602@enition.com> <3A5E090B.40601@enition.com> <20010111114318.C7240@fw.wintelcom.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Ok, let's start again (in plain text this time, thanx again, Daniel ;-) I use a private scheme to interact with the 'ipintr' isr. The two following routines are expected to be called either by our modified version of 'ip_input' at network SWI level or at user level. int my_global_ipl=0; void my_enter() { int s=splnet(); /* We do not expect this routine to be reentrant, thus the following sanity check. */ ASSERT(my_global_ipl==0); my_global_ipl=s; } void my_exit() { int s=my_global_ipl; my_global_ipl=0; splx(s); } The crashes I got are always due to the assertion failure occuring in the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network SWI level after another execution flow has called 'my_enter' itself and has *NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet', and the only explanation I have found is that the first execution flow has fallen asleep somewhere in the kernel (while this is not expected, of course !). Now, if you've read my first mail, I was actually asking for help onhow to dump the stack of an interrupted process with GDB when the kernelcrash occurs in the context of an isr. Actually, I would like to know how I could dump the stack of *any* process at the time of the crash. This way, I would be able to see where my user-land daemon was lying in the kernel when the interrupt occurs. Anyway, without this information, I am reduced to add some traps on the track of the execution of my process within my kernel code. This brought me to surround calls to MALLOC with counters as follows: somewhere_else() { ... my_enter(); /* handle competition with network isr (especially ipintr) */ ... some_counter++; MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT); some_other_counter++; ... my_exit(); ... } Then, all crashes I got show the following equation at the time of crash: ( some_counter - some_other_counter == 1 ) which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC. My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that : * system-scope scheduling is still done at process level (no kernel thread yet) * any process executing in the kernel cannot be preempted for execution by another process unless it either returns to user code or falls explicitely asleep. * the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set. Is that correct ? Well, I am obviously tracking a bug in my own code, but I would greatly appreciate to get help either on my GDB usage question or through technical hints on where I should look at to progress in my investigation. Thank you very much for your attention, Rgds, Xavier Alfred Perlstein wrote: > * Xavier Galleri [010111 11:27] wrote: > >> Hi everybody, >> >> I have reached a point where I am wondering if a call to 'malloc' with >> the M_NOWAIT flag is not falling asleep ! > > > M_NOWAIT shouldn't sleep. > >> In fact, I suspect that the interrupted context is somewhere during a >> call to 'malloc' (I increment a counter just before calling malloc and >> increment another just after and the difference is one !) while I have >> called 'splnet' beforehand, thus normally preventing competing with any >> network isr. I assume that this shouldnever occur unless the code is >> somewhere calling 'sleep' and provoke acontext switch. > > > if you add 1 to a variable the difference is expected to be one. > >> Is there anybody who can help on this ? > > > I'm not sure, you need to be more specific/clear. > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message