From owner-freebsd-current@FreeBSD.ORG Thu Mar 3 02:56:47 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6721B16A4CE; Thu, 3 Mar 2005 02:56:47 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2624C43D54; Thu, 3 Mar 2005 02:56:47 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 0F92172DD4; Wed, 2 Mar 2005 18:56:47 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 0A88872DCB; Wed, 2 Mar 2005 18:56:47 -0800 (PST) Date: Wed, 2 Mar 2005 18:56:46 -0800 (PST) From: Doug White To: David Xu In-Reply-To: <42259DCA.6060308@freebsd.org> Message-ID: <20050302184617.K82821@carver.gumbysoft.com> References: <549575862.20050226230200@takeda.tk> <20050228210235.C62607@carver.gumbysoft.com> <1119242149.20050301105816@takeda.tk><42259DCA.6060308@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=windows-1250 Content-Transfer-Encoding: QUOTED-PRINTABLE cc: freebsd-current@freebsd.org Subject: Re: FreeBSD 5.3 crash (core with debug symbols available) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Mar 2005 02:56:47 -0000 On Wed, 2 Mar 2005, David Xu wrote: > I believe this is caused by swapped out of kernel thread stack. > in /sys/vm/vm_glue.c, there is some code swapping out a sleeping process, > this means any kernel code can not use thread local variable to communica= te > with other threads, this is a rather unsafe assumptions, the vm code real= ly > should be disabled. I don't quite understand what you mean by "vm code really should be disabled"; is virtual memory really that bad? :) The consensus on IRC is that threads should not use their stacks for anything but storage of their own variables. Anything used for synchronization or state should be placed in malloc()d memory or some other shared structure. I'll start working on a patch to change these references in the sigwait() family. And queue up a pointy hat to jeff@. Pointers to other badly behaved code gladly accepted :) > David Xu > > Doug White wrote: > > >On Tue, 1 Mar 2005, Dariusz Kulinski wrote: > > > > > > > >>Hello Doug, > >> > >>Tuesday, March 1, 2005, 10:50:19 AM, you wrote: > >> > >> > >> > >>>>>Looks like it ran over a spammed thread, but I'll want to see the fa= ult > >>>>>address. Bets on whether its 0xdeadc0de+offset? > >>>>> > >>>>> > >>>>0xdeadc0de, huh? :) > >>>> > >>>> > >>>free()d memory regions get filled with 0xdeadc0de to hunt down > >>>use-after-free conditions. > >>> > >>> > >>Ok, I thought it was one of developer jokes =3D) > >> > >> > >> > >>>Thats what I want :-) > >>> > >>> > >>>OK, it wasn't deadc0de, so can you load the crashdump up, go down to t= he > >>>sigtd() frame, and "print *td"? It'll be a huge spew. > >>> > >>> > >>(kgdb) frame 20 > >>#20 0xc04e9d3f in sigtd (p=3D0xc16948d4, sig=3D14, prop=3D129) at /usr/= src/sys/kern/kern_sig.c:1581 > >>1581 if (td->td_waitset !=3D NULL && > >>(kgdb) print *td > >>$1 =3D {td_proc =3D 0xc16948d4, td_ksegrp =3D 0xc26b9310, td_plist =3D = {tqe_next =3D 0xc1b48190, tqe_prev =3D 0xc1b95198}, td_kglist =3D { > >> tqe_next =3D 0x0, tqe_prev =3D 0xc26b931c}, td_slpq =3D {tqe_next = =3D 0x0, tqe_prev =3D 0xc1794b80}, td_lockq =3D {tqe_next =3D 0x0, > >> tqe_prev =3D 0x0}, td_runq =3D {tqe_next =3D 0x0, tqe_prev =3D 0xc2= 6b9324}, td_selq =3D {tqh_first =3D 0x0, tqh_last =3D 0xc17c31c0}, > >> td_sleepqueue =3D 0x0, td_turnstile =3D 0xc15d5dc0, td_tid =3D 100081= , td_flags =3D 8, td_inhibitors =3D 6, td_pflags =3D 8, > >> td_dupfd =3D 0, td_wchan =3D 0xd12bfc20, td_wmesg =3D 0xc06cef0b "sig= wait", td_lastcpu =3D 0 '\0', td_oncpu =3D 255 '=FF', > >> td_locks =3D 0, td_blocked =3D 0x0, td_ithd =3D 0x0, td_lockname =3D = 0x0, td_contested =3D {lh_first =3D 0x0}, td_sleeplocks =3D 0x0, > >> td_intr_nesting_level =3D 0, td_pinned =3D 0, td_mailbox =3D 0x990301= 0, td_ucred =3D 0xc2b41b00, td_standin =3D 0x0, td_prticks =3D 0, > >> td_upcall =3D 0xc17c0510, td_sticks =3D 2210, td_uuticks =3D 0, td_us= ticks =3D 0, td_intrval =3D 0, td_oldsigmask =3D {__bits =3D {0, 0, > >> 0, 0}}, td_sigmask =3D {__bits =3D {159751, 0, 0, 0}}, td_siglist= =3D {__bits =3D {0, 0, 0, 0}}, td_waitset =3D 0xd12bfc64, > >> td_umtx =3D {tqe_next =3D 0x0, tqe_prev =3D 0x0}, td_generation =3D 3= 76536, td_sigstk =3D {ss_sp =3D 0x0, ss_size =3D 0, ss_flags =3D 0}, > >> td_kflags =3D 0, td_xsig =3D 0, td_profil_addr =3D 0, td_profil_ticks= =3D 0, td_base_pri =3D 104 'h', td_priority =3D 104 'h', > >> td_pcb =3D 0xd12bfda0, td_state =3D TDS_INHIBITED, td_retval =3D {0, = 137620480}, td_slpcallout =3D {c_links =3D {sle =3D { > >> sle_next =3D 0x0}, tqe =3D {tqe_next =3D 0x0, tqe_prev =3D 0xc1= cd68e4}}, c_time =3D 216540257, c_arg =3D 0xc17c3190, c_func =3D 0, > >> c_flags =3D 8}, td_frame =3D 0xd12bfd48, td_kstack_obj =3D 0xc17963= 18, td_kstack =3D 3509313536, td_kstack_pages =3D 2, > >> td_altkstack_obj =3D 0x0, td_altkstack =3D 0, td_altkstack_pages =3D = 0, td_critnest =3D 1, td_md =3D {md_savecrit =3D 582}, > >> td_sched =3D 0xc17c32e4} > >> > >> > > > > > >This is quite helpful, thanks! It appears the thread had called > >sigtimedwait() and the timeout fired. The clock ithread goes to whack th= e > >process with SIGALRM and checks if its waiting in sigtimedwait() > >specifically. That info is coded into the td_waitset member of struct > >thread, which get set from the user. All of the frontends provide the se= t > >from a stack variable. > > > >later, in kern_sigtimedwait()... > > > >926 td->td_waitset =3D &waitset; > >927 error =3D msleep(&ps, &p->p_mtx, PPAUSE|PCATCH, "sigwait", h= z); > > > >So now a pointer to stack variable is in the thread. Later on sigtd() > >comes along and wants to dereference it and that stack page isn't > >available according to the VM system and that trips the panic. > > > >Some more exploration is necessary. Can you make the crashdump and debug > >kernel available? Also, what was running when this panic tripped? ("inf= o > >threads" in kgdb may be useful.) > > > > > > > --=20 Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org