From owner-freebsd-current@FreeBSD.ORG Wed Mar 2 08:43:26 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0C3716A4CE for ; Wed, 2 Mar 2005 08:43:26 +0000 (GMT) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.FreeBSD.org (Postfix) with SMTP id 2825043D53 for ; Wed, 2 Mar 2005 08:43:26 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 97723 invoked from network); 2 Mar 2005 08:43:24 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 2 Mar 2005 08:43:24 -0000 X-pair-Authenticated: 80.161.118.233 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.1/8.13.1) with ESMTP id j228hNxp010493; Wed, 2 Mar 2005 09:43:23 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.1/8.13.1/Submit) id j228hNp6010492; Wed, 2 Mar 2005 09:43:23 +0100 (CET) (envelope-from pho) Date: Wed, 2 Mar 2005 09:43:23 +0100 From: Peter Holm To: Doug White Message-ID: <20050302084323.GA10394@peter.osted.lan> References: <549575862.20050226230200@takeda.tk> <20050228210235.C62607@carver.gumbysoft.com> <20050301104030.W68845@carver.gumbysoft.com> <1119242149.20050301105816@takeda.tk> <20050301184735.O72408@carver.gumbysoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050301184735.O72408@carver.gumbysoft.com> User-Agent: Mutt/1.4.2.1i cc: freebsd-current@freebsd.org Subject: Re: FreeBSD 5.3 crash (core with debug symbols available) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Mar 2005 08:43:27 -0000 On Tue, Mar 01, 2005 at 06:59:42PM -0800, Doug White wrote: > On Tue, 1 Mar 2005, Dariusz Kulinski wrote: > > > Hello Doug, > > > > Tuesday, March 1, 2005, 10:50:19 AM, you wrote: > > > > >> > Looks like it ran over a spammed thread, but I'll want to see the fault > > >> > address. Bets on whether its 0xdeadc0de+offset? > > >> 0xdeadc0de, huh? :) > > > free()d memory regions get filled with 0xdeadc0de to hunt down > > > use-after-free conditions. > > > > Ok, I thought it was one of developer jokes =) > > > > > Thats what I want :-) > > > > > OK, it wasn't deadc0de, so can you load the crashdump up, go down to the > > > sigtd() frame, and "print *td"? It'll be a huge spew. > > > > (kgdb) frame 20 > > #20 0xc04e9d3f in sigtd (p=0xc16948d4, sig=14, prop=129) at /usr/src/sys/kern/kern_sig.c:1581 > > 1581 if (td->td_waitset != NULL && > > (kgdb) print *td > > $1 = {td_proc = 0xc16948d4, td_ksegrp = 0xc26b9310, td_plist = {tqe_next = 0xc1b48190, tqe_prev = 0xc1b95198}, td_kglist = { > > tqe_next = 0x0, tqe_prev = 0xc26b931c}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xc1794b80}, td_lockq = {tqe_next = 0x0, > > tqe_prev = 0x0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xc26b9324}, td_selq = {tqh_first = 0x0, tqh_last = 0xc17c31c0}, > > td_sleepqueue = 0x0, td_turnstile = 0xc15d5dc0, td_tid = 100081, td_flags = 8, td_inhibitors = 6, td_pflags = 8, > > td_dupfd = 0, td_wchan = 0xd12bfc20, td_wmesg = 0xc06cef0b "sigwait", td_lastcpu = 0 '\0', td_oncpu = 255 '?', > > td_locks = 0, td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, > > td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x9903010, td_ucred = 0xc2b41b00, td_standin = 0x0, td_prticks = 0, > > td_upcall = 0xc17c0510, td_sticks = 2210, td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, > > 0, 0}}, td_sigmask = {__bits = {159751, 0, 0, 0}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_waitset = 0xd12bfc64, > > td_umtx = {tqe_next = 0x0, tqe_prev = 0x0}, td_generation = 376536, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, > > td_kflags = 0, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_base_pri = 104 'h', td_priority = 104 'h', > > td_pcb = 0xd12bfda0, td_state = TDS_INHIBITED, td_retval = {0, 137620480}, td_slpcallout = {c_links = {sle = { > > sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xc1cd68e4}}, c_time = 216540257, c_arg = 0xc17c3190, c_func = 0, > > c_flags = 8}, td_frame = 0xd12bfd48, td_kstack_obj = 0xc1796318, td_kstack = 3509313536, td_kstack_pages = 2, > > td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, td_critnest = 1, td_md = {md_savecrit = 582}, > > td_sched = 0xc17c32e4} > > > This is quite helpful, thanks! It appears the thread had called > sigtimedwait() and the timeout fired. The clock ithread goes to whack the > process with SIGALRM and checks if its waiting in sigtimedwait() > specifically. That info is coded into the td_waitset member of struct > thread, which get set from the user. All of the frontends provide the set > from a stack variable. > > later, in kern_sigtimedwait()... > > 926 td->td_waitset = &waitset; > 927 error = msleep(&ps, &p->p_mtx, PPAUSE|PCATCH, "sigwait", hz); > > So now a pointer to stack variable is in the thread. Later on sigtd() > comes along and wants to dereference it and that stack page isn't > available according to the VM system and that trips the panic. > Doesn't this problem look a lot like http://www.holm.cc/stress/log/cons111.html and http://www.holm.cc/stress/log/cons117.html? It seems that sigwait() + swapping causes this panic. I have a small test program that provokes it. - Peter > Some more exploration is necessary. Can you make the crashdump and debug > kernel available? Also, what was running when this panic tripped? ("info > threads" in kgdb may be useful.) > > -- > Doug White | FreeBSD: The Power to Serve > dwhite@gumbysoft.com | www.FreeBSD.org > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" -- Peter Holm