From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 07:58:04 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 51A7916A4CE for ; Thu, 30 Sep 2004 07:58:04 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id DBB3743D4C for ; Thu, 30 Sep 2004 07:58:03 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 84475 invoked from network); 30 Sep 2004 07:58:01 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 30 Sep 2004 07:58:01 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8U7w0Cs052399; Thu, 30 Sep 2004 09:58:00 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8U7w0Va052398; Thu, 30 Sep 2004 09:58:00 +0200 (CEST) (envelope-from pho) Date: Thu, 30 Sep 2004 09:57:59 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20040930075759.GA52233@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> <1096467843.3733.1145.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE" Content-Disposition: inline In-Reply-To: <1096477932.3733.1471.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 07:58:04 -0000 --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 29, 2004 at 01:12:13PM -0400, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote: > > On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > > > It's hard for me to tell if your patch has made any difference. > > > The freeze is still there. I'll try to make the same test once more > > > without your patches to see if I get the same pattern in freezes. > > > > I found some problems yesterday with mutex priority inheritance that > > could potentially cause your freeze patterns. > > > > I will try to roll a preliminary patch as soon as the caffeine does its > > magic. > > OK - here is a crude patch to fix some problems with mutex priority > inheritance. My theory is that the clock thread gets stuck waiting on > GIANT. > > During release/acquisition of a contested sleep mutex there are a few > windows where a task can be preempted when actions (waking up blocked > threads, ownership of the mutex, ..) need to be atomic as far as > scheduling is concerned. Otherwise priority inheritance may fail. The > patch uses critical_enter/critical_exit to protect these regions against > preemption. > > It would be great if could run this in addition to the other patches. > > Stephan OK, did so. Doesn't seem to make any difference. In order to spot a freeze I have instrumented hardclock() to report if Giant is being held more than 60 seconds. I don't know if this is any help, but here are examples of two freezes, both unfrozen by ping: Mounted root from ufs:/dev/ad0s1a. Giant held for more than 60 sec by td 0xc1ad5180, pid 1100 ~KDB: enter: Line break on console [thread 100105] Stopped at kdb_enter+0x2b: nop db> where 1100 sched_switch(c1ad5180,0,1) at sched_switch+0x14f mi_switch(1,0) at mi_switch+0x264 turnstile_wait(c17ec700,c10429cc,c1ad9900,c10429cc,2,c07f9e66,219) at turnstile_wait+0x2ec _mtx_lock_sleep(c10429cc,c1ad5180,0,c0812243,88d) at _mtx_lock_sleep+0x167 _mtx_lock_flags(c10429cc,0,c0812243,88d) at _mtx_lock_flags+0x85 vm_map_entry_delete(c1a63708,c2071374,cf289a10,c074a75b,c1a63708) at vm_map_entry_delete+0x7e vm_map_delete(c1a63708,0,bfc00000,c1a63708,c1a63708) at vm_map_delete+0x18f vm_map_remove(c1a63708,0,bfc00000) at vm_map_remove+0x42 exec_new_vmspace(cf289b94,c0897da0,c07f81d2,31e,c17d6318) at exec_new_vmspace+0x175 exec_elf32_imgact(cf289b94,c08c4e58,c08c4ef8,0,0) at exec_elf32_imgact+0x1b3 kern_execve(c1ad5180,8067470,806739c,8067404,0) at kern_execve+0x30e execve(c1ad5180,cf289d14,3,0,286) at execve+0x18 syscall(2f,2f,2f,8067470,806739c) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (59, FreeBSD ELF32, execve), eip = 0x2812b01f, esp = 0xbfbfe68c, ebp = 0xbfbfe6b8 --- db> show locks 1100 exclusive sx user map r = 0 (0xc1a6374c) locked @ vm/vm_map.c:2313 exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ vm/vm_object.c:453 db> c pid 1353: corrected slot count (2->1) Giant held for more than 60 sec by td 0xc1cba900, pid 1453 ~KDB: enter: Line break on console [thread 100134] Stopped at kdb_enter+0x2b: nop db> where 1453 sched_switch(c1cba900,0,1) at sched_switch+0x14f mi_switch(1,0) at mi_switch+0x264 turnstile_wait(c1af1e00,c08f5ea0,c1caa000,c08f5ea0,2,c07f9e66,219) at turnstile_wait+0x2ec _mtx_lock_sleep(c08f5ea0,c1cba900,0,c080174c,d99) at _mtx_lock_sleep+0x167 _mtx_lock_flags(c08f5ea0,0,c080174c,d99,c1a755ac) at _mtx_lock_flags+0x85 vfs_clean_pages(c66638e8,c66638e8,8048cd9,c7f16e82,1) at vfs_clean_pages+0x7c bdwrite(c66638e8) at bdwrite+0x2d0 ffs_write(cf407c14) at ffs_write+0x558 vn_write(c1d78ae4,cf407c88,c1a59880,0,c1cba900) at vn_write+0x1f8 dofilewrite(c1cba900,c1d78ae4,1,8048cd9,1) at dofilewrite+0xa8 write(c1cba900,cf407d14,3,45,292) at write+0x39 syscall(804002f,bfbf002f,bfbf002f,8049568,bfbfebc0) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x280c151f, esp = 0xbfbfeb3c, ebp = 0xbfbfeb78 --- db> show locks 1453 exclusive sleep mutex vm object (standard object) r = 0 (0xc1a755ac) locked @ kern/vfs_bio.c:3480 exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ kern/vfs_vnops.c:582 db> c -- Peter Holm --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kern_clock.diff" cvs diff: Diffing . Index: kern_clock.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_clock.c,v retrieving revision 1.172 diff -u -r1.172 kern_clock.c --- kern_clock.c 10 Jul 2004 21:36:01 -0000 1.172 +++ kern_clock.c 30 Sep 2004 07:19:44 -0000 @@ -193,6 +193,14 @@ mtx_unlock_spin_flags(&sched_lock, MTX_QUIET); } +int pho = 0; +int pho_giant = 0; +struct thread *pho_giant_td = NULL; +#define mtx_unowned(m) ((m)->mtx_lock == MTX_UNOWNED) +#define mtx_owner(m) (mtx_unowned((m)) ? NULL \ + : (struct thread *)((m)->mtx_lock & MTX_FLAGMASK)) + + /* * The real-time timer, interrupting hz times per second. */ @@ -239,6 +247,23 @@ if (need_softclock) swi_sched(softclock_ih, 0); + if (pho > 0) + if (--pho == 0) panic("testing ..."); + + if (!(mtx_unowned(&Giant))) { + if (pho_giant_td != mtx_owner(&Giant)) { + pho_giant_td = mtx_owner(&Giant); + pho_giant = 0; + } + if (++pho_giant == 60*hz) { + printf("Giant held for more than %d sec by td %p, pid %d\n", + pho_giant / hz, pho_giant_td, + pho_giant_td->td_proc->p_pid); + pho_giant = 0; + } + } else + pho_giant = 0; + #ifdef SW_WATCHDOG if (watchdog_enabled > 0 && --watchdog_ticks <= 0) watchdog_fire(); --Kj7319i9nmIyA2yE--