From owner-freebsd-current@FreeBSD.ORG Fri Feb 11 16:48:44 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43C7216A4CF for ; Fri, 11 Feb 2005 16:48:44 +0000 (GMT) Received: from mail28.sea5.speakeasy.net (mail28.sea5.speakeasy.net [69.17.117.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id A3B6943D31 for ; Fri, 11 Feb 2005 16:48:43 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 18454 invoked from network); 11 Feb 2005 16:48:42 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender )AES256-SHA encrypted SMTP for ; 11 Feb 2005 16:48:41 -0000 Received: from [10.50.40.202] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j1BGmKf0020922; Fri, 11 Feb 2005 11:48:22 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: current@FreeBSD.org Date: Fri, 11 Feb 2005 11:48:45 -0500 User-Agent: KMail/1.6.2 MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200502111148.45976.jhb@FreeBSD.org> X-Spam-Status: No, score=-102.8 required=4.2 tests=ALL_TRUSTED, USER_IN_WHITELIST autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx cc: jeffr@FreeBSD.org cc: Alan Cox cc: Julian Elischer Subject: HEADSUP: Turn off cpu_idle_hlt on SMP for now on x86 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2005 16:48:44 -0000 I figured out a deadlock I got on a 4 CPU testbox last night and I think I have tracked it down to a bug in 4BSD that can be worked around by turning off idle halt via the machdep.cpu_idle_hlt sysctl. Specifically, it involves waking up halted CPUs when a thread it is pinned or bound to a specific CPU is made runnable. The details of what I saw today are below: FreeBSD/i386 (deimos.baldwin.cx) (ttyd0) login: KDB: enter: Line break on console [thread pid 13 tid 100006 ] Stopped at kdb_enter+0x30: leave db> show pcpu cpuid = 3 curthread = 0xc1dfd5c0: pid 11 "idle: cpu3" curpcb = 0xdc608da0 fpcurthread = none idlethread = 0xc1dfd5c0: pid 11 "idle: cpu3" APIC ID = 3 currentldt = 0x28 db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 79079 c2ebe1fc 1001 79067 610 0000012 [LOCK vm page queue mutex c1f90c40] make 79078 c2f2ede4 1001 78894 610 0000002 [LOCK vm page queue mutex c1f90c40] sh 79077 c3498000 1001 79043 610 0004002 [LOCK vm page queue mutex c1f90c40] make 79075 c2f317f0 1001 78898 610 0004002 [LOCK vm page queue mutex c1f90c40] make 79074 c2184de4 1001 78902 610 0004002 [LOCK vm page queue mutex c1f90c40] make 79067 c23c71fc 1001 78955 610 0004002 [SLPQ ppwait 0xc23c71fc][SLP] make 79060 c23e35f4 1001 78915 610 0004002 [LOCK Giant c1e41800] make 79043 c2a50000 1001 79042 610 0004002 [SLPQ wait 0xc2a50000][SLP] sh 79042 c2f493f8 1001 78884 610 0004002 [SLPQ select 0xc075e024][SLP] make 78978 c28de7f0 1001 78975 610 0004002 [RUNQ] sh 78975 c29bf1fc 1001 78970 610 0004002 [SLPQ select 0xc075e024][SLP] make ... 40 c1e67000 0 0 0 0000204 [IWAIT] swi1: net 39 c1e671fc 0 0 0 0000204 [IWAIT] swi3: vm 38 c1e4b5f4 0 0 0 000020c [LOCK Giant c1e41800] swi4: clock sio --More-- 37 c1e4b7f0 0 0 0 0000204 [IWAIT] irq13: 36 c1e4b9ec 0 0 0 0000204 [IWAIT] irq0: clk 35 c1e4bbe8 0 0 0 0000204 [IWAIT] irq23: ahc0 ahc1 34 c1e4bde4 0 0 0 0000204 [IWAIT] irq22: 33 c1e5b000 0 0 0 0000204 [IWAIT] irq21: 32 c1e5b1fc 0 0 0 0000204 [IWAIT] irq20: 31 c1e5b3f8 0 0 0 0000204 [LOCK Giant c1e41800] irq19: fxp0 uhci0 ... 14 c1dfc5f4 0 0 0 000020c [Can run] idle: cpu0 13 c1dfc7f0 0 0 0 000020c [CPU 1] idle: cpu1 12 c1dfc9ec 0 0 0 000020c [CPU 2] idle: cpu2 11 c1dfcbe8 0 0 0 000020c [CPU 3] idle: cpu3 Thus, CPU 0 is running a make process while CPUs 1, 2, and 3 are all idle. Note that softclock() and irq19 (me trying to ssh in this morning over fxp0) are both blocked on Giant. Also note that there are 5 threads blocked on the vm page queue mutex. First, let's see who owns Giant and what they are doing. I was doing a benchmark, so I didn't have witness compiled in or turned on, so 'show alllocks' wasn't an option. db> x/x Giant,10 Giant: c0730028 c06ffb2c c06ffb2c b0000 0 Giant+0x14: 0 0 c2ebd8a2 0 Giant+0x24: 0 c07300a0 0 0 time_monitor: 92c86 7 420c903b The owner is 0xc2ebd8a2 but minus 0x2 since that is a flag: db> show threads ... 100286 (0xc2ebd8a0) sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170 ... 100189 (0xc28dfcf0) sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170 ... db> tr 100286 Tracing pid 79075 tid 100286 td 0xc2ebd8a0 sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170 mi_switch(1,0,c07026d2,254,c075d568) at mi_switch+0x1d9 turnstile_wait(c076bda0,c28dfcf0,c2ebd8a0,d17bc930,c21a79e8) at turnstile_wait+0x269 _mtx_lock_sleep(c076bda0,c2ebd8a0,0,c0704083,dc5) at _mtx_lock_sleep+0x101 _mtx_lock_flags(c076bda0,0,c0704083,dc5,c0704083) at _mtx_lock_flags+0x45 vfs_clean_pages(d17bc930,0,c0704083,423,c21a79b4) at vfs_clean_pages+0x70 bdwrite(d17bc930,c3bdfa00,80,2000,0) at bdwrite+0x2f0 ffs_update(c46bf564,0,1,50,e00efb2c) at ffs_update+0x396 ufs_inactive(e00efb44,e00efb5c,c05978ea,e00efb44,0) at ufs_inactive+0x225 VOP_INACTIVE_AP(e00efb44,0,c0705028,768,c074e7a0) at VOP_INACTIVE_AP+0x4e vrele(c46bf564,c074e2a0,c46bf564,1,c255a700) at vrele+0x11a vn_close(c46bf564,1,c255a700,c2ebd8a0,0) at vn_close+0x50 vn_closefile(c30e6318,c2ebd8a0,c06fddb9,846,c30e6318) at vn_closefile+0xf4 fdrop_locked(c30e6318,c2ebd8a0,c06fddb9,831) at fdrop_locked+0x94 fdrop(c30e6318,c2ebd8a0,c06fddb9,77c,c0515cfd,c0737ce0,1000,c2ebd8a0,e00efcac,c0 56009c,1,e00efc80,c255a700,0,e00efc90,e00efc90,c0513783,3f,c31b93c8,0,6,c31b9300,e00efcb0,c051389d,c31b9300,0,6) at fdrop+0x3c closef(c30e6318,c2ebd8a0,c06fddb9,3e8,c2ebd8a0) at closef+0x3f2 close(c2ebd8a0,e00efd14,4,e00efd00,1) at close+0x1f2 syscall(2f,2f,2f,809c200,0) at syscall+0x2a0 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (6, FreeBSD ELF32, close), eip = 0x805c743, esp = 0xbfbfd18c, ebp = 0xbfbfd1a8 --- Lookup the pid in the ps output above, I see that this thread is one of the threads blocked on the vm page queue mutex. Ok, so let's look at that lock's owner: db> x/x 0xc076bda0,10 vm_page_queue_mtx: c0730028 c07100af c07100af b0000 vm_page_queue_mtx+0x10: 0 0 0 c28dfcf2 vm_page_queue_mtx+0x20: 0 0 c1e673f8 f540 vm_pages_needed: 0 0 0 1552 Lock owner is 0xc28dfcf2. Looked that up via show threads and got: db> tr 100189 Tracing pid 78978 tid 100189 td 0xc28dfcf0 sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170 mi_switch(2,0,c0701343,252,0) at mi_switch+0x1d9 critical_exit(c074bcc0,c052ddd5,c0755f18,0,c28dfcf0) at critical_exit+0x7e intr_execute_handlers(c074bcc0,dff58b10,c076bc00,dff58b1c,c28dfcf0) at intr_execute_handlers+0xa4 atpic_handle_intr(0) at atpic_handle_intr+0xaf Xatpic_intr0() at Xatpic_intr0+0x20 --- interrupt, eip = 0xc054dd3c, esp = 0xdff58b54, ebp = 0xdff58b68 --- critical_exit(dff58b94,c06bbcbd,c075cae0,0,c0715535) at critical_exit+0xcc _mtx_unlock_spin_flags(c075cae0,0,c0715535,252,e) at _mtx_unlock_spin_flags+0x23 pmap_invalidate_page(c29c869c,8064000,2,766,c19d6d10) at pmap_invalidate_page+0xdd pmap_enter(c29c869c,8064000,c19d6d10,3,0) at pmap_enter+0x33d vm_fault(c29c85dc,8064000,2,8,c28dfcf0) at vm_fault+0x13e7 trap_pfault(dff58d48,1,80643e8,bfbfe060,80643e8) at trap_pfault+0xf1 trap(2f,2f,2f,0,8069000) at trap+0x21a calltrap() at calltrap+0x5 --- trap 0xc, eip = 0x80563cf, esp = 0xbfbfe0d0, ebp = 0xbfbfe0d8 --- So this is the thread that is in state RUNQ even though I have three idle CPUs. Very curious! I then wondered if my current CPU (CPU 3) thought that there were any runnable threads: db> call sched_runnable() 0 0 means no. That's when I figured that the runnable thread must be pinned, especially since it was interrupted while messing with the pmap which usually requires pinning. I then tried to do an experiment to test my theory by turning off the idle halt and manually IPI'ing all the other CPUs so that they would recheck their run queues and whichever CPU needed to run the process owning the vm page queue mutex would run. Unfortunately, I think I got the vector number wrong as all 3 CPUs paniced. :-P db> x cpu_idle_hlt cpu_idle_hlt: 1 db> w cpu_idle hlt 0 cpu_idle_hlt 0x1 = 0 db> call lapic_ipi_raw(0x000c00e7, 0) 0xd1622000 db> c cccpppuuuiiiddd === 102;;; aaapppiiiccc iiiddd === 000102 iiinnnssstttrrruuuccctttiiiooonnn pppoooiiinnnttteerer r = = = 0 x008xx:80:8x0:cx00cx60ca50536bae055eb9e a5st as ctkas ctpkao cipkno tipenorti en rt e r = 0 x= 1 0 :0=0x x10d0xc:16000xed:cf0bex40dfc afb6r40abm ... 857 (make) trap number = 30 panic: unknown/reserved trap cpuid = 0 Uptime: 1d16h58m16s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset: Stopping other CPUs Thus, my theory is that when the pinned thread was preempted and put back on the run queue, the scheduler didn't IPI the CPU it was pinned to to wake it up in case it was idle. The IPI is only needed if the CPUs are halted, which is why I think turning the idle halt off might work as a workaround. I don't know if ULE has this same issue, but I've cc'd Jeff and hopefully he can look into it. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org