From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 21 17:10:12 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A2E4106568F for ; Fri, 21 Aug 2009 17:10:12 +0000 (UTC) (envelope-from brampton@gmail.com) Received: from mail-ew0-f209.google.com (mail-ew0-f209.google.com [209.85.219.209]) by mx1.freebsd.org (Postfix) with ESMTP id E15288FC13 for ; Fri, 21 Aug 2009 17:10:11 +0000 (UTC) Received: by ewy5 with SMTP id 5so1396044ewy.36 for ; Fri, 21 Aug 2009 10:10:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Pi2jGAmQaoO2vgPpRidBPt7EKM8k/7Qo8RsVyahOksg=; b=PbicYdsP+57JnBle7JN3CLfX1h9uAWF8UNaWX4wvm2InbrZHStinZnDw8xfq2X8ASu XFqmNL+hDN0ki5EGAuWnGSD48h+3aGNGRLQhAGxdkzFSeKSjBXaHBL++Rmd2a8LodEFG 5oJ1XK3XUUjgepUHBtEIp61BbErQ7CsVHkVy0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; b=d/S37BiAAMsx5H58KB+UK+j4RqhWB0rZXhCexnvTPlykwZZjDTcSwVJZOQCl+nGJRl Vw4zK+yOCTq0ENn9Rpj0u8tsgmahMn70b9BHA4YrHvxQgYS5Cpj+cn9Z0KdFWOMf/nRI UcRPsVGOFppDew8XypXAGz7qCDKRX6bhYsYrk= MIME-Version: 1.0 Sender: brampton@gmail.com Received: by 10.216.47.201 with SMTP id t51mr252433web.198.1250874610423; Fri, 21 Aug 2009 10:10:10 -0700 (PDT) Date: Fri, 21 Aug 2009 18:10:10 +0100 X-Google-Sender-Auth: 0c2b677ac477fa79 Message-ID: From: Andrew Brampton To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: DTrace lockup on a dual processor VMWare X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2009 17:10:12 -0000 Hi, I am running a amd64 FreeBSD 7.2 inside a VMWare, and DTrace has been working great. However, I have just changed my VMWare to use two processors instead of one, and things have started to break. I can load the dtraceall module but when I run hotkernel the machine hangs. I have DDB compiled into the kernel, but I was unable to break into it, instead I used a remote GDB (on the host machine) to give me the follow backtraces: CPU 0 #0 rdtsc (n=) at ./machine/cpufunc.h:379 #1 DELAY (n=) at /usr/7.2/sys/amd64/isa/clock.c:290 #2 0xffffffff80512d2a in _mtx_lock_spin (m=0xffffff00018b4128, tid=18446742974215591648, opts=, file=, line=) at /usr/7.2/sys/kern/kern_mutex.c:480 #3 0xffffffff80513304 in _mtx_lock_spin_flags (m=0xffffff00018b4128, opts=0, file=0xffffffff80f31600 "/usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c", line=574) at /usr/7.2/sys/kern/kern_mutex.c:229 #4 0xffffffff80f30ca0 in cyclic_fire (frame=0xfffffffe89ce7ae0) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:574 #5 cyclic_clock (frame=0xfffffffe89ce7ae0) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/amd64/cyclic_machdep.c:99 #6 0xffffffff807cf7bb in lapic_handle_timer (frame=0xfffffffe89ce7ae0) at /usr/7.2/sys/amd64/amd64/local_apic.c:695 #7 0xffffffff807c9db7 in Xtimerint () at /usr/7.2/sys/amd64/amd64/apic_vector.S:103 CPU 1 #0 smp_rendezvous_cpus (map=2, setup_func=0, action_func=0xffffffff80f301a0 , teardown_func=0xffffffff80558b00 , arg=0xfffffffe8a8e94e0) at /usr/7.2/sys/kern/subr_smp.c:388 #1 0xffffffff80f3055e in xcall (arg=, c=, func=0, param=0x46) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/amd64/cyclic_machdep.c:131 #2 0xffffffff80f301ed in cyclic_reprogram (cpu=, exp=) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:540 #3 0xffffffff80f31194 in cyclic_add_here (cpu=0xffffff00018b4100, hdlr=0xfffffffe8a8e95a0, when=0xfffffffe8a8e9590, flags=0) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:765 #4 0xffffffff80f314ea in cyclic_omni_start (omni=) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:956 #5 cyclic_add_omni (omni=) at /usr/7.2/sys/modules/cyclic/../../cddl/dev/cyclic/cyclic.c:1226 #6 0xffffffff80f21665 in profile_enable (arg=, id=2149531648, parg=0x0) at /usr/7.2/sys/modules/dtrace/profile/../../../cddl/dev/profile/profile.c:450 #7 0xffffffff80f3c6cf in dtrace_ecb_enable (probe=0x1, arg=) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:9656 #8 dtrace_ecb_create_enable (probe=0x1, arg=) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:10471 #9 0xffffffff80f40142 in dtrace_match (pkp=0xfffffffe8a8e9800, priv=31, uid=27929688, zoneid=0, matched=0xffffffff80f3c130 , arg=0xffffff000db1f840) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7212 #10 0xffffffff80f40346 in dtrace_probe_enable (desc=0xffffff0001a9d218, enab=0xffffff000db1f840) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7992 #11 0xffffffff80f40434 in dtrace_enabling_match (enab=0xffffff000db1f840, nmatched=0xffffff000db83528) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:11366 #12 0xffffffff80f4c19a in dtrace_ioctl (dev=, cmd=, addr=0xffffff000db83520 "", flags=, td=0xffffff0001ab7370) at /usr/7.2/sys/modules/dtrace/dtrace/../../../cddl/dev/dtrace/dtrace_ioctl.c:380 #13 0xffffffff804b14e6 in devfs_ioctl_f (fp=0xffffff00014db780, com=3222304774, data=0xffffff000db83520, cred=, td=0xffffff0001ab7370) at /usr/7.2/sys/fs/devfs/devfs_vnops.c:602 #14 0xffffffff80562063 in fo_ioctl (td=0xffffff0001ab7370, fd=7, com=3222304774, data=0xffffff000db83520 "") at /usr/7.2/sys/sys/file.h:269 #15 kern_ioctl (td=0xffffff0001ab7370, fd=7, com=3222304774, data=0xffffff000db83520 "") at /usr/7.2/sys/kern/sys_generic.c:627 #16 0xffffffff805622c9 in ioctl (td=0xffffff0001ab7370, uap=0xfffffffe8a8e9bf0) at /usr/7.2/sys/kern/sys_generic.c:571 #17 0xffffffff807e54fc in syscall (frame=0xfffffffe8a8e9c80) at /usr/7.2/sys/amd64/amd64/trap.c:900 #18 0xffffffff807c971b in Xfast_syscall () at /usr/7.2/sys/amd64/amd64/exception.S:330 It seems that CPU 0 is doing nothing, permanently stuck on line 379 of cpufunc.h, whereas CPU 1 is spinning inside the loop on line 388 of subr_smp.c. On thing I notice that when I run hotkernel on a single processor VMWare it prints out: > hotkernel Sampling... Hit Ctrl+C to end dtrace: buffer size lowered to 1m dtrace: aggregation size lowered to 1m When I run it on the dual-processor VMWare it only prints > hotkernel Sampling... Hit Ctrl+C to end So I suspect it is locking up before it gets to the code that prints "buffer size lowered". Additionally, I have built my own kernel, using all the standard options and sources, with just the following extra options: options KDTRACE_FRAME # Ensure frames are compiled in options KDTRACE_HOOKS # Kernel DTrace hooks options DDB_CTF options DEVICE_POLLING options KDB options KDB_UNATTENDED options DDB options GDB options BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_MEMGUARD options HWPMC_HOOKS device hwpmc Can anyone suggest anything to try and debug/fix this problem. I'm happy to hack the kernel sources if need be. thanks Andrew