From owner-freebsd-current@FreeBSD.ORG Mon Dec 19 19:46:31 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6482916A41F; Mon, 19 Dec 2005 19:46:31 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.FreeBSD.org (Postfix) with ESMTP id 273FB43D6A; Mon, 19 Dec 2005 19:46:27 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.5b3) with ESMTP id 4153092 for multiple; Mon, 19 Dec 2005 14:44:28 -0500 Received: from localhost (john@localhost [127.0.0.1]) by server.baldwin.cx (8.13.4/8.13.4) with ESMTP id jBJJkIui068559; Mon, 19 Dec 2005 14:46:19 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: Anish Mistry Date: Mon, 19 Dec 2005 14:46:47 -0500 User-Agent: KMail/1.8.2 References: <200512161237.15148.mistry.7@osu.edu> <200512161638.58917.jhb@freebsd.org> <200512161904.04913.mistry.7@osu.edu> In-Reply-To: <200512161904.04913.mistry.7@osu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200512191446.50344.jhb@freebsd.org> X-Virus-Scanned: ClamAV 0.87.1/1213/Mon Dec 19 09:48:34 2005 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-0.3 required=4.2 tests=ALL_TRUSTED,BIZ_TLD autolearn=failed version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx X-Server: High Performance Mail Server - http://surgemail.com r=1653887525 Cc: threads@freebsd.org, freebsd-current@freebsd.org, davidxu@freebsd.org Subject: Re: Reproducable Panic on CURRENT and 6.0-RELEASE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2005 19:46:31 -0000 On Friday 16 December 2005 07:03 pm, Anish Mistry wrote: > On Friday 16 December 2005 04:38 pm, you wrote: > > On Friday 16 December 2005 03:27 pm, Anish Mistry wrote: > > > On Friday 16 December 2005 03:11 pm, you wrote: > > > > On Friday 16 December 2005 12:37 pm, Anish Mistry wrote: > > > > > Here is the offending program/code. The interesting program > > > > > is avidemux_2.1_branch_anish/avidemux/avidemux2. > > > > > (It is compiled for CURRENT, and I left all the object code > > > > > stuff in so it's a bit large 21MB) > > > > > http://am-productions.biz/docs/avidemux_2.1_branch_anish.tgz > > > > > > > > > > First you'll need to compile spidermonkey to be threadsafe so > > > > > add the following to your lang/spidermonkey/Makefile before > > > > > installing it: LIB_DEPENDS= nspr4.1:${PORTSDIR}/devel/nspr > > > > > MAKE_ARGS+= JS_THREADSAFE=YES LDFLAGS="-L${LOCALBASE}/lib > > > > > -lpthread -lm" > > > > > CFLAGS+= -I${LOCALBASE}/include/nspr > > > > > > > > > > Once a threadsafe spidermonkey is installed to kill the > > > > > machine you'll need to: > > > > > cd avidemux_2.1_branch_anish/avidemux > > > > > ./avidemux2 --run new-features-test.js > > > > > > > > > > On CURRENT: > > > > > kernel trap 12 with interrupts disabled > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > > > fault virtual address = 0x68 > > > > > fault code = supervisor read, page not present > > > > > instruction pointer = 0x20:0xc04e6f36 > > > > > stack pointer = 0x28:0xcc9edb3c > > > > > frame pointer = 0x28:0xcc9edbb0 > > > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > > > = DPL 0, pres 1, def32 1, gran 1 > > > > > processor eflags = resume, IOPL = 0 > > > > > current process = 798 (gdb) > > > > > trap number = 12 > > > > > panic: page fault > > > > > > > > > > #0 doadump () at pcpu.h:165 > > > > > #1 0xc04bb7eb in boot (howto=260) > > > > > at /usr/src/sys/kern/kern_shutdown.c:399 > > > > > #2 0xc04bb353 in panic (fmt=0xc06069a7 "%s") > > > > > at /usr/src/sys/kern/kern_shutdown.c:555 > > > > > #3 0xc05e91ba in trap_fatal (frame=0xcc9edafc, eva=104) > > > > > at /usr/src/sys/i386/i386/trap.c:862 > > > > > #4 0xc05e96d9 in trap (frame= > > > > > {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = > > > > > -1032878460, tf_esi = 1, tf_ebp = -862004304, tf_isp = > > > > > -862004440, tf_ebx = -1033297504, tf_edx = -1033987232, > > > > > tf_ecx = 4, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = > > > > > -1068601546, tf_cs = 32, tf_eflags = 65687, tf_esp = > > > > > -1032878356, tf_ss = -1067380424}) at > > > > > /usr/src/sys/i386/i386/trap.c:273 > > > > > #5 0xc05db6fa in calltrap () > > > > > at /usr/src/sys/i386/i386/exception.s:137 > > > > > #6 0xc04e6f36 in kern_ptrace (td=0xc25e9b60, req=10, pid=1, > > > > > addr=0x0, data=17) > > > > > at /usr/src/sys/kern/sys_process.c:802 > > > > > > > > On HEAD this is: > > > > p->p_xthread->td_flags &= ~TDF_XSIG; > > > > > > > > If two threads called kern_ptrace() with the same PID and this > > > > could happen. Hmm, I have no idea how p_xthread is supposed to > > > > not be racey here in fact. It would be helpful to know what > > > > PTRACE action it it is trying to do and maybe a KTR trace of > > > > the various ptrace events leading up to this condition. I have > > > > no idea what thread you are supposed to act on if p_xthread is > > > > NULL either. > > > > > > How would I do this? My kdb/ddb skills are prettymuch limited to > > > getting a backtrace. > > > > You could add some new KTR tracepoints to log each request into > > kern_ptrace() and then do a 'show ktr' at the ddb prompt. > > I put a KTR_GEN tracepoint in kern_ptrace and only got 1 entry in the > log: > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x68 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc04ed896 > stack pointer = 0x28:0xcc9a9b3c > frame pointer = 0x28:0xcc9a9bb0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 697 (gdb) > [thread pid 697 tid 100073 ] > Stopped at kern_ptrace+0xef6: movl 0x68(%eax),%ebx > db> show ktr > 0 (0xc2354b60): kern_ptrace: td=0xc2354b60 req=0xa pid=695 addr==0x0 > data==0x0 Ok, so it's doing a PT_ATTACH on pid 695. > --- End of trace buffer --- > db> > > The full alltrace: > http://am-productions.biz/docs/ktr-trace.txt.gz > From alltrace results for pid 695 is: > db> bt > Tracing pid 697 tid 100073 td 0xc2354b60 > kern_ptrace(c2354b60,a,2b7,0,11) at kern_ptrace+0xef6 > ptrace(c2354b60,cc9a9d04,4,0,23) at ptrace+0x40 > syscall(3b,3b,3b,81e9438,2b7) at syscall+0x19a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (26, FreeBSD ELF32, ptrace), eip = 0x282c1a6b, esp = > 0xbfbfe468, ebp = 0xbfbfe480 --- > db> alltrace > > Tracing command gdb pid 697 tid 100073 td 0xc2354b60 > kern_ptrace(c2354b60,a,2b7,0,11) at kern_ptrace+0xef6 > ptrace(c2354b60,cc9a9d04,4,0,23) at ptrace+0x40 > syscall(3b,3b,3b,81e9438,2b7) at syscall+0x19a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (26, FreeBSD ELF32, ptrace), eip = 0x282c1a6b, esp = > 0xbfbfe468, ebp = 0xbfbfe480 --- > > Tracing command avidemux2 pid 695 tid 100080 td 0xc2635680 > sched_switch(c2635680,0,2,2ffd8312,ea5ba7fb) at sched_switch+0xb5 > mi_switch(2,0,c2635680,ac,c0619941) at mi_switch+0x259 > uio_yield(0,0,47000,0,c25f0074) at uio_yield+0x72 > vn_rdwr_inchunks(1,c2642840,89b1000,b37000,47000,0,0,101,c2640c00,0,0,c2635 >680) at vn_rdwr_inchunks+0xb4 > elf32_coredump(c2635680,c2642840,ffffffff,7fffffff) at > elf32_coredump+0x132 > sigexit(c2635680,6,c2634294,8,c0618e65) at sigexit+0x8df > kse_thr_interrupt(c2635680,cca0dd04,3,0,0) at kse_thr_interrupt+0x10c > syscall(3b,3b,3b,20,0) at syscall+0x19a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (382, FreeBSD ELF32, kse_thr_interrupt), eip = 0x28fe5603, > esp = 0xbf8fdaec, ebp = 0xbf8fdb60 --- > > Tracing command avidemux2 pid 695 tid 100078 td 0xc26359c0 > sched_switch(c26359c0,c2635820,1,82b08812,3b03415f) at > sched_switch+0xb5 > mi_switch(1,c2635820,0,c26359c0,cca13ba0) at mi_switch+0x259 > sleepq_switch(0,cca13bd0,c04c5896,c263422c,0) at sleepq_switch+0xc2 > sleepq_wait_sig(c263422c,0,100,c0618588,31f) at sleepq_wait_sig+0xc > msleep(c263422c,c2634294,15c,c0620da6,0) at msleep+0x356 > kern_wait(c26359c0,2b8,cca13c28,0,0) at kern_wait+0x350 > wait4(c26359c0,cca13d04,4,0,0) at wait4+0x2d > syscall(3b,3b,3b,94f1000,bfbfde90) at syscall+0x19a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip = 0x2903a067, esp = > 0xbfbfdc04, ebp = 0xbfbfdc1c --- > > Tracing command avidemux2 pid 695 tid 100077 td 0xc2635b60 > sched_switch(c2635b60,0,1,cbf0f192,13141da1) at sched_switch+0xb5 > mi_switch(1,0,0,c2635b60,cca16c04) at mi_switch+0x259 > sleepq_switch(0,c2635b60,cca16c38,c04c595a,c26342b4) at > sleepq_switch+0xc2 > sleepq_timedwait_sig(c26342b4) at sleepq_timedwait_sig+0xd > msleep(c26342b4,c2634294,168,c0618e91,bb9) at msleep+0x41a > kse_release(c2635b60,cca16d04,1,0,1) at kse_release+0xb8 > syscall(3b,3b,3b,81,97c3200) at syscall+0x19a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x28fe55c3, esp = > 0xbf9fef78, ebp = 0xbf9fefa8 --- Given that one thread is doing a coredump, I bet someone tried to enter single threading mode, and single threading mode sets P_STOPPED_SINGLE _without_ setting p_xthread, thus P_SHOULDSTOP() is true, but p_xthread is NULL. I guess thread_single() should set both p_singlethread and p_xthread? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org