From owner-freebsd-current@FreeBSD.ORG Wed Sep 29 19:47:07 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B0D3616A4CF for ; Wed, 29 Sep 2004 19:47:07 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B20343D1F for ; Wed, 29 Sep 2004 19:47:07 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 742747A446; Wed, 29 Sep 2004 12:47:06 -0700 (PDT) Message-ID: <415B113A.1060508@elischer.org> Date: Wed, 29 Sep 2004 12:47:06 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Damian Gerow References: <20040927211341.GB30059@afflictions.org> <20040929040222.GI5115@afflictions.org> <415A3E1B.5010404@elischer.org> <20040929050154.GJ5115@afflictions.org> <415A6AE3.4050309@elischer.org> <415A6B8A.1040902@elischer.org> <20040929093723.GB29565@afflictions.org> In-Reply-To: <20040929093723.GB29565@afflictions.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: current@freebsd.org Subject: Re: Random processes hanging in unkillable state in -BETA6 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 19:47:07 -0000 EEEKKK! Damian Gerow wrote: >Thus spake Julian Elischer (julian@elischer.org) [29/09/04 04:01]: >: oh yeahh the output of the ps for that process would be good to seetoo. >: (the ps in ddb) >: >: there is an option to make ddb use printf() > >sysctl debug.ddb_use_print, for archival purposes. > >: which will make it's outut show up in dmesg after you 'c' >: (continue) back running again.. otherwise you'd need a serial consol to >: record it all. > >It's up at . I added in the >commands I typed, just for clarity (not needed, I know). > > - Damian > struct kg_sched { struct thread *skg_last_assigned; == NULL int skg_avail_opennings; == 0x7d000 <----------!!!!!!!!!!!!!!!! int skg_concurrency; = 1 int skg_runq_kses; = 0 };\ in the 6 ksegrp scheduler private structures we have, we see: skg_last_assigned skg_avail_opennings skg_concurrency skg_runq_kses 0 7d000 1 0 0 ce02 8c5 0 0 7d000 1 0 0 7d000 1 0 0 7d000 1 0 0 1ecb0 408 0 all the values of 7d000 are impossible.. in fact all the values in that column are "impossible". the values of 8c5 and 408 are also impossible for concurrency.. either we have corruption of the structures, or we have a failure to initialise the contents.. or we ahve a "leak" of opennings looking at the values and the fact that 7d000 appears in several of them I am suspicious that we didn't clear it properly at init. (goes to look at code..) hmmm yep i tlooks like htat might be it. try the following diff warning: cut'n'paste.. apply by hand. diff -u -r1.199 kern_thread.c --- kern/kern_thread.c 25 Sep 2004 00:53:46 -0000 1.199 +++ kern/kern_thread.c 29 Sep 2004 19:45:56 -0000 @@ -282,13 +282,13 @@ * Initialize type-stable parts of a ksegrp (when newly created). */ static int -ksegrp_init(void *mem, int size, int flags) +ksegrp_ctor(void *mem, int size, int flags) { struct ksegrp *kg; kg = (struct ksegrp *)mem; + bzero(mem, size); kg->kg_sched = (struct kg_sched *)&kg[1]; - /* sched_newksegrp(kg); */ return (0); } @@ -369,7 +369,7 @@ tid_zone = uma_zcreate("TID", sizeof(struct tid_bitmap_part), NULL, NULL, NULL, NULL, UMA_ALIGN_CACHE, 0); ksegrp_zone = uma_zcreate("KSEGRP", sched_sizeof_ksegrp(), - NULL, NULL, ksegrp_init, NULL, + ksegrp_ctor, NULL, NULL, NULL, UMA_ALIGN_CACHE, 0); kseinit(); /* set up kse specific stuff e.g. upcall zone*/ }