From owner-freebsd-threads@FreeBSD.ORG Mon Jun 28 11:03:03 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E49BF16A4CE for ; Mon, 28 Jun 2004 11:03:03 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id C803B43D49 for ; Mon, 28 Jun 2004 11:03:03 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i5SB2Irh003976 for ; Mon, 28 Jun 2004 11:02:18 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i5SB2HZS003970 for freebsd-threads@freebsd.org; Mon, 28 Jun 2004 11:02:17 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 28 Jun 2004 11:02:17 GMT Message-Id: <200406281102.i5SB2HZS003970@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Subject: Current problem reports assigned to you X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2004 11:03:04 -0000 Current FreeBSD problem reports Critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/06/13] kern/19247 threads uthread_sigaction.c does not do anything s [2004/03/15] kern/64313 threads FreeBSD (OpenBSD) pthread implicit set/un o [2004/04/22] threads/65883threads libkse's sigwait does not work after fork 3 problems total. Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/07/18] kern/20016 threads pthreads: Cannot set scheduling timer/Can o [2000/08/26] misc/20861 threads libc_r does not honor socket timeouts o [2001/01/20] bin/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVT o [2001/01/25] bin/24632 threads libc_r delicate deviation from libc in ha o [2001/01/25] misc/24641 threads pthread_rwlock_rdlock can deadlock o [2001/11/26] bin/32295 threads pthread dont dequeue signals o [2002/02/01] i386/34536 threads accept() blocks other threads o [2002/05/25] kern/38549 threads the procces compiled whith pthread stoppe o [2002/06/27] bin/39922 threads [PATCH?] Threaded applications executed w o [2002/08/04] misc/41331 threads Pthread library open sets O_NONBLOCK flag o [2003/03/02] bin/48856 threads Setting SIGCHLD to SIG_IGN still leaves z o [2003/03/10] bin/49087 threads Signals lost in programs linked with libc o [2003/05/08] bin/51949 threads thread in accept cannot be cancelled 13 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/05/26] misc/18824 threads gethostbyname is not thread safe o [2000/10/21] misc/22190 threads A threaded read(2) from a socketpair(2) f o [2001/09/09] bin/30464 threads pthread mutex attributes -- pshared o [2002/05/02] bin/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwri s [2002/07/16] misc/40671 threads pthread_cancel doesn't remove thread from 5 problems total. From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 19:05:03 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FCA116A4CE for ; Fri, 2 Jul 2004 19:05:03 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id E71CF43D31 for ; Fri, 2 Jul 2004 19:05:02 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i62J3cqM007536 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 2 Jul 2004 15:03:38 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i62J3Wjk009234; Fri, 2 Jul 2004 15:03:32 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16613.45444.528419.643022@grasshopper.cs.duke.edu> Date: Fri, 2 Jul 2004 15:03:32 -0400 (EDT) To: freebsd-threads@freebsd.org X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Subject: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 19:05:03 -0000 I've got a character device which is used for OS-bypass NIC, and I've got a problem.. We just started using a second thread in our userland library. The idea is this worker thread ioctls into the driver, where he sleeps waiting for an interrupt from the NIC. When he gets the interrupt, he wakes up and returns from the ioctl, where he will process some recently completed events. The problem happens when exiting. When main application thread decides to exit, it does an ioctl into the driver to wakeup the sleeping worker thread. The worker thread thread wakes up, and then exits, then the main thread closes his file descriptor and exits. The problem I'm seeing is that I get a panic like the following when using KSE. (A linux binary works fine, ioctls are translated..) The interesting thing is that there is no stack.. Just one function from my driver (mx_free()) sitting out there by itself. Is the kernel somehow ripping the kernel stacks of all threads out from under them when one thread calls exit()? How do I take a reference so I don't risk getting marooned without a stack? Thanks, Drew Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc1d69150 stack pointer = 0x10:0x0 frame pointer = 0x10:0x0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 843 (mx_loopback_test) trap number = 12 panic: page fault cpuid = 0; Stack backtrace: backtrace(c068b9ae,0,c068f727,ffffff28,100) at backtrace+0x17 panic(c068f727,c06b21bf,c1cc0300,1,1) at panic+0x134 trap_fatal(ffffffc0,0,1,0,c1cc19a0) at trap_fatal+0x313 trap_pfault(ffffffc0,0,0,0,0) at trap_pfault+0x22d trap(18,10,10,0,c16e30e0) at trap+0x2dd calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc1d69150, esp = 0, ebp = 0 --- mx_free() at mx_free+0x1b boot() called on cpu#0 Uptime: 2m45s From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 19:36:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A70116A4CE for ; Fri, 2 Jul 2004 19:36:25 +0000 (GMT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8DB6C43D1F for ; Fri, 2 Jul 2004 19:36:24 +0000 (GMT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i62JYu3u004748; Fri, 2 Jul 2004 15:34:56 -0400 (EDT) Date: Fri, 2 Jul 2004 15:34:56 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Andrew Gallatin In-Reply-To: <16613.45444.528419.643022@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 19:36:25 -0000 On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > I've got a character device which is used for OS-bypass NIC, and I've > got a problem.. > > We just started using a second thread in our userland library. The > idea is this worker thread ioctls into the driver, where he sleeps > waiting for an interrupt from the NIC. When he gets the interrupt, > he wakes up and returns from the ioctl, where he will process some > recently completed events. > > The problem happens when exiting. When main application thread > decides to exit, it does an ioctl into the driver to wakeup the > sleeping worker thread. The worker thread thread wakes up, and then > exits, then the main thread closes his file descriptor and exits. > > The problem I'm seeing is that I get a panic like the following when > using KSE. (A linux binary works fine, ioctls are translated..) > > The interesting thing is that there is no stack.. Just one function > from my driver (mx_free()) sitting out there by itself. Is the kernel > somehow ripping the kernel stacks of all threads out from under them > when one thread calls exit()? How do I take a reference so I > don't risk getting marooned without a stack? exit() exits the process, including reaping all kernel threads. I'm not sure why one thread (worker) doing an exit() will still allow other threads to continue running. You should be using pthread_exit() to exit from the worker thread, but that still doesn't explain why you're having the problem. I think just calling exit() in the application is sufficient also. There's no need to GC the worker thread since the kernel should take care of all the other threads. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 21:17:08 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E1B916A4CE for ; Fri, 2 Jul 2004 21:17:08 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 140B543D1D for ; Fri, 2 Jul 2004 21:17:06 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i62LBJqM021492 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Jul 2004 17:11:19 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i62LBE65009337; Fri, 2 Jul 2004 17:11:14 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16613.53106.413179.808734@grasshopper.cs.duke.edu> Date: Fri, 2 Jul 2004 17:11:14 -0400 (EDT) To: Daniel Eischen In-Reply-To: References: <16613.45444.528419.643022@grasshopper.cs.duke.edu> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 21:17:08 -0000 Daniel Eischen writes: > On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > The interesting thing is that there is no stack.. Just one function > > from my driver (mx_free()) sitting out there by itself. Is the kernel > > somehow ripping the kernel stacks of all threads out from under them > > when one thread calls exit()? How do I take a reference so I > > don't risk getting marooned without a stack? > > exit() exits the process, including reaping all kernel threads. > I'm not sure why one thread (worker) doing an exit() will > still allow other threads to continue running. You should > be using pthread_exit() to exit from the worker thread, > but that still doesn't explain why you're having the problem. > Thanks.. I'm calling pthread_exit() now. Still having a problem. What can you tell about the state of threads from this ddb info: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc1d69193 stack pointer = 0x10:0x0 frame pointer = 0x10:0x0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1937 (mx_loopback_test) kernel: type 12 trap, code=0 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0651e11 stack pointer = 0x10:0xfffffefc frame pointer = 0x10:0xffffff1c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 1937 (mx_loopback_test) kernel: type 12 trap, code=0 Stopped at kdb_trap+0x151: movl 0x40(%edx),%eax db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 1937 c1c5a898 e6319000 1387 643 1937 000c002 (threaded) mx_loopback_test thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] thread 0xc21cedc0 ksegrp 0xc1cf1c00 [SLPQ ksesigwait 0xc1c5a998][SLP] thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] db> sho thread 0xc21cec60 Proc 0xc1c5a898 thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] sched_switch(c21cec60,df262f7,22c29cb3,ffc03014,c21cec60) at sched_switch+0xbc mi_switch(1,c052c35e,c182c5dc,c1c5a898,0) at mi_switch+0x1a2 sleepq_switch(c182c5dc,0,0,e8474c98,c0512cef) at sleepq_switch+0x169 sleepq_timedwait_sig(c182c5dc,0,c1c5a904,c069e850,0) at sleepq_timedwait_sig+0x17 msleep(c182c5dc,c1c5a904,168,c069e850,ea61) at msleep+0x490 kse_release(c21cec60,e8474d14,4,c04f102e,1) at kse_release+0x288 syscall(2f,2f,2f,8052200,0) at syscall+0x2f0 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280941a7, esp = 0x8193f90, ebp = 0x8193fcc --- db> sho thread 0xc21cedc0 Proc 0xc1c5a898 thread 0xc21cedc0 ksegrp 0xc1cf1c00 [SLPQ ksesigwait 0xc1c5a998][SLP] sched_switch(c21cedc0,2717cc87,22c51a72,ffc00014,c21cedc0) at sched_switch+0xbc mi_switch(1,c052c35e,c1c5a998,c1c5a898,0) at mi_switch+0x1a2 sleepq_switch(c1c5a998,0,0,e8477c98,c0512cef) at sleepq_switch+0x169 sleepq_timedwait_sig(c1c5a998,0,c1c5a904,c069e845,0) at sleepq_timedwait_sig+0x17 msleep(c1c5a998,c1c5a904,168,c069e845,7531) at msleep+0x490 kse_release(c21cedc0,e8477d14,4,c04f102e,1) at kse_release+0x195 syscall(2f,2f,2f,8052100,81) at syscall+0x2f0 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280941a7, esp = 0xbfafef40, ebp = 0xbfafef8c --- db> sho thread 0xc1b962c0 Proc 0xc1c5a898 thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] kdb_trap(c,0,ffffffc0,1,1) at kdb_trap+0x151 trap_fatal(ffffffc0,0,1,0,c1b962c0) at trap_fatal+0x2e3 trap_pfault(ffffffc0,0,0,0,0) at trap_pfault+0x22d trap(18,10,10,0,c16c8ce0) at trap+0x2dd calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc1d69193, esp = 0, ebp = 0 --- mx_free() at mx_free+0x1b db> (gdb) l * kse_release+0x288 0xc04f5145 is in kse_release (../../../kern/kern_kse.c:357). 352 kg->kg_upsleeps++; 353 td->td_kflags |= TDK_KSEREL; 354 error = msleep(&kg->kg_completed, &p->p_mtx, 355 PPAUSE|PCATCH, "kserel", 356 (uap->timeout ? tvtohz(&tv) : 0)); 357 td->td_kflags &= ~(TDK_KSEREL | TDK_WAKEUP); 358 kg->kg_upsleeps--; 359 } 360 PROC_UNLOCK(p); 361 } (gdb) l * kse_release+0x195 0xc04f5052 is in kse_release (../../../kern/kern_kse.c:343). 338 /* UTS wants to wait for signal event */ 339 if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags & KUF_DOUPCALL)) { 340 td->td_kflags |= TDK_KSERELSIG; 341 error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE|PCATCH, 342 "ksesigwait", (uap->timeout ? tvtohz(&tv) : 0)); 343 td->td_kflags &= ~(TDK_KSERELSIG | TDK_WAKEUP); 344 } 345 p->p_flag &= ~P_SIGEVENT; 346 sigset = p->p_siglist; 347 PROC_UNLOCK(p); (from objdump -D -S, since gdb -k seems to no longer work..) 00008178 : void mx_free(void *ptr) { 8178: 55 push %ebp 8179: 89 e5 mov %esp,%ebp 817b: 83 ec 08 sub $0x8,%esp free(ptr, M_MXBUF); 817e: c7 44 24 04 20 71 02 movl $0x27120,0x4(%esp) 8185: 00 8186: 8b 45 08 mov 0x8(%ebp),%eax 8189: 89 04 24 mov %eax,(%esp) 818c: e8 fc ff ff ff call 818d } 8191: 89 ec mov %ebp,%esp 8193: 5d pop %ebp 8194: c3 ret Thanks, Drew From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 21:34:45 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F81316A4CE for ; Fri, 2 Jul 2004 21:34:45 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14CB643D39 for ; Fri, 2 Jul 2004 21:34:45 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i62LXoqM024054 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Jul 2004 17:33:50 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i62LXiTQ009357; Fri, 2 Jul 2004 17:33:44 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16613.54456.926009.472934@grasshopper.cs.duke.edu> Date: Fri, 2 Jul 2004 17:33:44 -0400 (EDT) To: Daniel Eischen In-Reply-To: References: <16613.45444.528419.643022@grasshopper.cs.duke.edu> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 21:34:45 -0000 I'm not saying its a KSE bug, but I just thought to try libthr via libmap.conf. That works just fine.. Just for the heck of it, is there any easy global way to force all KSE threads to be system scope? I'm just looking for something that's hard to mess up ; Drew From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 22:11:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C97D16A4CE for ; Fri, 2 Jul 2004 22:11:23 +0000 (GMT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id B63D743EC0 for ; Fri, 2 Jul 2004 22:11:22 +0000 (GMT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i62MBJ3u006834; Fri, 2 Jul 2004 18:11:19 -0400 (EDT) Date: Fri, 2 Jul 2004 18:11:19 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Andrew Gallatin In-Reply-To: <16613.53106.413179.808734@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 22:11:23 -0000 On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > Daniel Eischen writes: > > On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > > > The interesting thing is that there is no stack.. Just one function > > > from my driver (mx_free()) sitting out there by itself. Is the kernel > > > somehow ripping the kernel stacks of all threads out from under them > > > when one thread calls exit()? How do I take a reference so I > > > don't risk getting marooned without a stack? > > > > exit() exits the process, including reaping all kernel threads. > > I'm not sure why one thread (worker) doing an exit() will > > still allow other threads to continue running. You should > > be using pthread_exit() to exit from the worker thread, > > but that still doesn't explain why you're having the problem. > > > > Thanks.. I'm calling pthread_exit() now. Still having a problem. > > What can you tell about the state of threads from this ddb info: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc1d69193 > stack pointer = 0x10:0x0 > frame pointer = 0x10:0x0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 1937 (mx_loopback_test) > kernel: type 12 trap, code=0 > kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc0651e11 > stack pointer = 0x10:0xfffffefc > frame pointer = 0x10:0xffffff1c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1937 (mx_loopback_test) > kernel: type 12 trap, code=0 > Stopped at kdb_trap+0x151: movl 0x40(%edx),%eax > > db> ps > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > 1937 c1c5a898 e6319000 1387 643 1937 000c002 (threaded) mx_loopback_test > thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] > thread 0xc21cedc0 ksegrp 0xc1cf1c00 [SLPQ ksesigwait 0xc1c5a998][SLP] > thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] The second thread (0xc21cedc0) is the special signal handler thread. It's of no real interest here. > db> sho thread 0xc21cec60 > Proc 0xc1c5a898 thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] > sched_switch(c21cec60,df262f7,22c29cb3,ffc03014,c21cec60) at sched_switch+0xbc > mi_switch(1,c052c35e,c182c5dc,c1c5a898,0) at mi_switch+0x1a2 > sleepq_switch(c182c5dc,0,0,e8474c98,c0512cef) at sleepq_switch+0x169 > sleepq_timedwait_sig(c182c5dc,0,c1c5a904,c069e850,0) at sleepq_timedwait_sig+0x17 > msleep(c182c5dc,c1c5a904,168,c069e850,ea61) at msleep+0x490 > kse_release(c21cec60,e8474d14,4,c04f102e,1) at kse_release+0x288 > syscall(2f,2f,2f,8052200,0) at syscall+0x2f0 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280941a7, esp = 0x8193f90, ebp = 0x8193fcc --- > > > db> sho thread 0xc1b962c0 > Proc 0xc1c5a898 thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] > kdb_trap(c,0,ffffffc0,1,1) at kdb_trap+0x151 > trap_fatal(ffffffc0,0,1,0,c1b962c0) at trap_fatal+0x2e3 > trap_pfault(ffffffc0,0,0,0,0) at trap_pfault+0x22d > trap(18,10,10,0,c16c8ce0) at trap+0x2dd > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc1d69193, esp = 0, ebp = 0 --- > mx_free() at mx_free+0x1b > db> Do you know which thread is doing the mx_free()? Is it the worker thread? If you are calling in to the kernel to wakeup the worker thread, it may be possible for the worker thread to return and run before the calling thread returns from the kernel and runs. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 22:11:40 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1E62916A4CE for ; Fri, 2 Jul 2004 22:11:40 +0000 (GMT) Received: from rwcrmhc11.comcast.net (rwcrmhc11.comcast.net [204.127.198.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6B40C43EDF for ; Fri, 2 Jul 2004 22:11:39 +0000 (GMT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([24.7.73.28]) by comcast.net (rwcrmhc11) with ESMTP id <2004070222113701300jgu33e>; Fri, 2 Jul 2004 22:11:37 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id PAA09531; Fri, 2 Jul 2004 15:11:36 -0700 (PDT) Date: Fri, 2 Jul 2004 15:11:33 -0700 (PDT) From: Julian Elischer To: Andrew Gallatin In-Reply-To: <16613.53106.413179.808734@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 22:11:40 -0000 On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > Daniel Eischen writes: > > On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > > > The interesting thing is that there is no stack.. Just one function > > > from my driver (mx_free()) sitting out there by itself. Is the kernel > > > somehow ripping the kernel stacks of all threads out from under them > > > when one thread calls exit()? How do I take a reference so I > > > don't risk getting marooned without a stack? > > > > exit() exits the process, including reaping all kernel threads. > > I'm not sure why one thread (worker) doing an exit() will > > still allow other threads to continue running. You should > > be using pthread_exit() to exit from the worker thread, > > but that still doesn't explain why you're having the problem. > > > > Thanks.. I'm calling pthread_exit() now. Still having a problem. > > What can you tell about the state of threads from this ddb info: > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc1d69193 > stack pointer = 0x10:0x0 > frame pointer = 0x10:0x0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 1937 (mx_loopback_test) > kernel: type 12 trap, code=0 > kernel trap 12 with interrupts disabled > > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc0651e11 > stack pointer = 0x10:0xfffffefc > frame pointer = 0x10:0xffffff1c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1937 (mx_loopback_test) > kernel: type 12 trap, code=0 > Stopped at kdb_trap+0x151: movl 0x40(%edx),%eax > > db> ps > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > 1937 c1c5a898 e6319000 1387 643 1937 000c002 (threaded) mx_loopback_test > thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] > thread 0xc21cedc0 ksegrp 0xc1cf1c00 [SLPQ ksesigwait 0xc1c5a998][SLP] > thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] When one thread calls exit() it marks the fact that the process is exiting, and then tries to wakeup all the other threads, and then suspends itself. The other threads, when awoken are supposed to notice what's going on and abort whatever they are doing and when they release all their resources, (by unrolling back to the user boundary) they are supposed to call thread_exit(). The last one out is supposed to wakeyup the original thread that called exit(), which can then proceed on the basis that it is now the only remaining thread. If there are threads waiting in uninterruptble sleeps then the process as a whole can not exit until they have finished sleeping and come back to the user boundary and called thread_exit(). None of the three threads you show is in exit, or even anything related to exit. > > db> sho thread 0xc21cec60 > Proc 0xc1c5a898 thread 0xc21cec60 ksegrp 0xc182c580 [SLPQ kserel 0xc182c5dc][SLP] > sched_switch(c21cec60,df262f7,22c29cb3,ffc03014,c21cec60) at sched_switch+0xbc > mi_switch(1,c052c35e,c182c5dc,c1c5a898,0) at mi_switch+0x1a2 > sleepq_switch(c182c5dc,0,0,e8474c98,c0512cef) at sleepq_switch+0x169 > sleepq_timedwait_sig(c182c5dc,0,c1c5a904,c069e850,0) at sleepq_timedwait_sig+0x17 > msleep(c182c5dc,c1c5a904,168,c069e850,ea61) at msleep+0x490 > kse_release(c21cec60,e8474d14,4,c04f102e,1) at kse_release+0x288 > syscall(2f,2f,2f,8052200,0) at syscall+0x2f0 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280941a7, esp = 0x8193f90, ebp = 0x8193fcc --- > > db> sho thread 0xc21cedc0 > Proc 0xc1c5a898 thread 0xc21cedc0 ksegrp 0xc1cf1c00 [SLPQ ksesigwait 0xc1c5a998][SLP] > sched_switch(c21cedc0,2717cc87,22c51a72,ffc00014,c21cedc0) at sched_switch+0xbc > mi_switch(1,c052c35e,c1c5a998,c1c5a898,0) at mi_switch+0x1a2 > sleepq_switch(c1c5a998,0,0,e8477c98,c0512cef) at sleepq_switch+0x169 > sleepq_timedwait_sig(c1c5a998,0,c1c5a904,c069e845,0) at sleepq_timedwait_sig+0x17 > msleep(c1c5a998,c1c5a904,168,c069e845,7531) at msleep+0x490 > kse_release(c21cedc0,e8477d14,4,c04f102e,1) at kse_release+0x195 > syscall(2f,2f,2f,8052100,81) at syscall+0x2f0 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (383, FreeBSD ELF32, kse_release), eip = 0x280941a7, esp = 0xbfafef40, ebp = 0xbfafef8c --- > > db> sho thread 0xc1b962c0 > Proc 0xc1c5a898 thread 0xc1b962c0 ksegrp 0xc182c580 [CPU 1][kse 0xc2161360] > kdb_trap(c,0,ffffffc0,1,1) at kdb_trap+0x151 > trap_fatal(ffffffc0,0,1,0,c1b962c0) at trap_fatal+0x2e3 > trap_pfault(ffffffc0,0,0,0,0) at trap_pfault+0x22d > trap(18,10,10,0,c16c8ce0) at trap+0x2dd > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc1d69193, esp = 0, ebp = 0 --- > mx_free() at mx_free+0x1b > db> > > > > (gdb) l * kse_release+0x288 > 0xc04f5145 is in kse_release (../../../kern/kern_kse.c:357). > 352 kg->kg_upsleeps++; > 353 td->td_kflags |= TDK_KSEREL; > 354 error = msleep(&kg->kg_completed, &p->p_mtx, > 355 PPAUSE|PCATCH, "kserel", > 356 (uap->timeout ? tvtohz(&tv) : 0)); > 357 td->td_kflags &= ~(TDK_KSEREL | TDK_WAKEUP); > 358 kg->kg_upsleeps--; > 359 } > 360 PROC_UNLOCK(p); > 361 } > > > (gdb) l * kse_release+0x195 > 0xc04f5052 is in kse_release (../../../kern/kern_kse.c:343). > 338 /* UTS wants to wait for signal event */ > 339 if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags & KUF_DOUPCALL)) { > 340 td->td_kflags |= TDK_KSERELSIG; > 341 error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE|PCATCH, > 342 "ksesigwait", (uap->timeout ? tvtohz(&tv) : 0)); > 343 td->td_kflags &= ~(TDK_KSERELSIG | TDK_WAKEUP); > 344 } > 345 p->p_flag &= ~P_SIGEVENT; > 346 sigset = p->p_siglist; > 347 PROC_UNLOCK(p); > > > (from objdump -D -S, since gdb -k seems to no longer work..) > 00008178 : > > void > mx_free(void *ptr) > { > 8178: 55 push %ebp > 8179: 89 e5 mov %esp,%ebp > 817b: 83 ec 08 sub $0x8,%esp > free(ptr, M_MXBUF); > 817e: c7 44 24 04 20 71 02 movl $0x27120,0x4(%esp) > 8185: 00 > 8186: 8b 45 08 mov 0x8(%ebp),%eax > 8189: 89 04 24 mov %eax,(%esp) > 818c: e8 fc ff ff ff call 818d > } > 8191: 89 ec mov %ebp,%esp > 8193: 5d pop %ebp > 8194: c3 ret > I can't even find mx_free in my sources.. I'll cvs update and see if it's new.. if so then that's kinda suspicious right there.. ummmm nope.. where is mx_free? > > Thanks, > > Drew > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" > From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 22:17:59 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B71916A4E3 for ; Fri, 2 Jul 2004 22:17:59 +0000 (GMT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 326E443ECD for ; Fri, 2 Jul 2004 21:58:08 +0000 (GMT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i62Lux3u004143; Fri, 2 Jul 2004 17:56:59 -0400 (EDT) Date: Fri, 2 Jul 2004 17:56:59 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Andrew Gallatin In-Reply-To: <16613.54456.926009.472934@grasshopper.cs.duke.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 22:18:00 -0000 On Fri, 2 Jul 2004, Andrew Gallatin wrote: > > > I'm not saying its a KSE bug, but I just thought to try libthr via > libmap.conf. That works just fine.. > > Just for the heck of it, is there any easy global way to force all KSE > threads to be system scope? I'm just looking for something that's > hard to mess up ; Add CFLAGS+=-DSYSTEM_SCOPE_ONLY to /etc/make.conf or uncomment it in src/lib/libpthread/Makefile. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Jul 2 22:41:50 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A81416A4CE for ; Fri, 2 Jul 2004 22:41:50 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5FB1043D45; Fri, 2 Jul 2004 22:41:50 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from freebsd.org (davidxu@localhost [127.0.0.1]) i62MfjWv040939; Fri, 2 Jul 2004 22:41:46 GMT (envelope-from davidxu@freebsd.org) Message-ID: <40E5E432.2020803@freebsd.org> Date: Sat, 03 Jul 2004 06:39:46 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Gallatin References: <16613.45444.528419.643022@grasshopper.cs.duke.edu> In-Reply-To: <16613.45444.528419.643022@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-threads@freebsd.org Subject: Re: odd KSE panic X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2004 22:41:50 -0000 What scheduler are you using ? can you switch to another scheduler to see if the problem is still there ? David Xu Andrew Gallatin wrote: >I've got a character device which is used for OS-bypass NIC, and I've >got a problem.. > >We just started using a second thread in our userland library. The >idea is this worker thread ioctls into the driver, where he sleeps >waiting for an interrupt from the NIC. When he gets the interrupt, >he wakes up and returns from the ioctl, where he will process some >recently completed events. > >The problem happens when exiting. When main application thread >decides to exit, it does an ioctl into the driver to wakeup the >sleeping worker thread. The worker thread thread wakes up, and then >exits, then the main thread closes his file descriptor and exits. > >