From owner-freebsd-current@FreeBSD.ORG Mon May 13 11:18:13 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 65FD5745 for ; Mon, 13 May 2013 11:18:13 +0000 (UTC) (envelope-from roger.pau@citrix.com) Received: from SMTP.EU.CITRIX.COM (smtp.eu.citrix.com [46.33.159.39]) by mx1.freebsd.org (Postfix) with ESMTP id C19BAA7C for ; Mon, 13 May 2013 11:18:12 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.87,552,1363132800"; d="scan'208";a="4490494" Received: from lonpmailmx01.citrite.net ([10.30.203.162]) by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/RC4-MD5; 13 May 2013 11:09:11 +0000 Received: from Roger-2.local (10.30.249.38) by LONPMAILMX01.citrite.net (10.30.203.162) with Microsoft SMTP Server id 8.3.298.1; Mon, 13 May 2013 12:18:05 +0100 Message-ID: <5190CBEC.5000704@citrix.com> Date: Mon, 13 May 2013 13:18:04 +0200 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Subject: FreeBSD-HEAD gets stuck on vnode operations Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 May 2013 11:18:13 -0000 Hello, I've set up a FreeBSD-HEAD VM on Xen, and compiled the XENHVM kernel, last commit in the repository is: Date: Tue, 7 May 2013 12:39:14 +0000 Subject: [PATCH] By request, add an arrow from NetBSD-0.8 to FreeBSD-1.0. While here, add a few more NetBSD versions to the tree itself. Submitted by: Alan Barrett Submitted by: Thomas Klausner And I've started stressing the VM using the following tight loop: while [ 1 ]; do make -j12 buildkernel; done >From time to time, I see the VM getting stuck and when breaking into KDB this is the output of ps: pid ppid pgrp uid state wmesg wchan cmd 32343 670 670 25 L *Name Cac 0xfffffe0017183780 sendmail 32342 32256 5273 0 L+ *Name Cac 0xfffffe0017183780 sh 32341 32335 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32340 32284 5273 0 D+ ufs 0xfffffe000dc76068 ctfconvert 32339 32293 5273 0 L+ *Name Cac 0xfffffe0017183780 ctfconvert 32337 32332 5273 0 L+ *vnode_fr 0xfffffe00872c23c0 cc 32335 32334 5273 0 S+ wait 0xfffffe010537d000 cc 32334 30655 5273 0 S+ wait 0xfffffe010ac974b8 sh 32333 32329 5273 0 L+ *vnode in 0xfffffe0017183180 cc 32332 32331 5273 0 S+ wait 0xfffffe00416a14b8 cc 32331 30655 5273 0 S+ wait 0xfffffe00b9846970 sh 32329 32328 5273 0 S+ wait 0xfffffe0049ace000 cc 32328 30655 5273 0 S+ wait 0xfffffe0041670000 sh 32324 32320 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32320 32318 5273 0 S+ wait 0xfffffe00416184b8 cc 32318 30655 5273 0 S+ wait 0xfffffe00879fd970 sh 32314 32313 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32313 32311 5273 0 S+ wait 0xfffffe00b9846000 cc 32312 32309 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32311 30655 5273 0 S+ wait 0xfffffe00415844b8 sh 32310 32305 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32309 32307 5273 0 S+ wait 0xfffffe00499624b8 cc 32307 30655 5273 0 S+ wait 0xfffffe010537c970 sh 32305 32304 5273 0 S+ wait 0xfffffe0041669000 cc 32304 30655 5273 0 S+ wait 0xfffffe00173d9970 sh 32303 32298 5273 0 L+ *Name Cac 0xfffffe0017183780 cc 32298 32295 5273 0 S+ wait 0xfffffe0049924000 cc 32295 30655 5273 0 S+ wait 0xfffffe0041631970 sh 32293 30655 5273 0 S+ wait 0xfffffe000d15b000 sh 32284 30655 5273 0 S+ wait 0xfffffe00416684b8 sh 32256 31391 5273 0 S+ select 0xfffffe000d965840 make 32022 30655 5273 0 L+ *Name Cac 0xfffffe0017183780 sh 31391 31386 5273 0 S+ wait 0xfffffe0041680970 sh 31386 30664 5273 0 S+ select 0xfffffe0017942cc0 make 30664 30663 5273 0 S+ wait 0xfffffe004169f4b8 sh 30663 30662 5273 0 S+ select 0xfffffe000d1c0d40 make 30662 30655 5273 0 S+ wait 0xfffffe00b9ddb4b8 sh 30655 5287 5273 0 S+ select 0xfffffe000d1c0040 make 5287 5280 5273 0 S+ wait 0xfffffe004148f970 sh 5280 5278 5273 0 S+ select 0xfffffe000d964540 make 5278 5273 5273 0 S+ wait 0xfffffe000de3a970 sh 5273 736 5273 0 S+ select 0xfffffe000bc81740 make 91658 735 91658 0 Ss+ ttyin 0xfffffe000d6caca8 bash 736 735 736 0 Ss+ wait 0xfffffe000d5084b8 bash 735 734 735 0 Ss select 0xfffffe000bc817c0 screen 734 730 734 0 S+ pause 0xfffffe000d6330a8 screen 730 729 730 0 S+ wait 0xfffffe000d58b000 bash 729 728 729 1001 S+ wait 0xfffffe000d7ab970 su 728 727 728 1001 Ss+ wait 0xfffffe000de3a4b8 sh 727 724 724 1001 S select 0xfffffe000d98f340 sshd 724 664 724 0 Ss select 0xfffffe000d4180c0 sshd 722 721 722 0 L+ *Name Cac 0xfffffe0017183780 bash 721 1 721 0 Ss+ wait 0xfffffe000d7ab000 login 720 1 720 0 Ss+ ttyin 0xfffffe000d0470a8 getty 719 1 719 0 Ss+ ttyin 0xfffffe000d0474a8 getty 718 1 718 0 Ss+ ttyin 0xfffffe000bd914a8 getty 717 1 717 0 Ss+ ttyin 0xfffffe000bd918a8 getty 716 1 716 0 Ss+ ttyin 0xfffffe000bd91ca8 getty 715 1 715 0 Ss+ ttyin 0xfffffe000d0440a8 getty 714 1 714 0 Ss+ ttyin 0xfffffe000d0444a8 getty 713 1 713 0 Ss+ ttyin 0xfffffe000d0448a8 getty 674 1 674 0 Ls *Name Cac 0xfffffe0017183780 cron 670 1 670 25 Ss pause 0xfffffe000d5090a8 sendmail 667 1 667 0 Ls *Name Cac 0xfffffe0017183780 sendmail 664 1 664 0 Ss select 0xfffffe000d964840 sshd 566 1 566 0 Ss select 0xfffffe000d98f4c0 syslogd 472 1 472 0 Ss select 0xfffffe000bc81840 devd 471 1 471 65 Ls *Name Cac 0xfffffe0017183780 dhclient 431 1 431 0 Ss select 0xfffffe000d98fc40 dhclient 18 0 0 0 LL *Name Cac 0xfffffe0017183780 [softdepflush] 17 0 0 0 RL CPU 1 [syncer] 9 0 0 0 DL vlruwt 0xfffffe000d15c000 [vnlru] 8 0 0 0 DL psleep 0xffffffff815305ac [bufdaemon] 7 0 0 0 DL pgzero 0xffffffff81538c94 [pagezero] 6 0 0 0 DL psleep 0xffffffff81537ed0 [vmdaemon] 5 0 0 0 DL psleep 0xffffffff815a4084 [pagedaemon] 4 0 0 0 DL ccb_scan 0xffffffff81300f30 [xpt_thrd] 3 0 0 0 DL waiting_ 0xffffffff81598160 [sctp_iterator] 2 0 0 0 DL balloon 0xffffffff8079b9f0 [balloon] 16 0 0 0 SL xbread 0xfffffe00feffc000 [xenstore_rcv] 15 0 0 0 SL waitev 0xffffffff81538e98 [xenwatch] 14 0 0 0 DL - 0xffffffff8131d008 [yarrow] 13 0 0 0 DL (threaded) [geom] 100023 D - 0xffffffff8158ca50 [g_down] 100022 D - 0xffffffff8158ca48 [g_up] 100021 D - 0xffffffff8158ca40 [g_event] 12 0 0 0 WL (threaded) [intr] 100044 I [irq7: ppc0] 100043 I [swi0: uart] 100042 I [irq12: psm0] 100041 I [irq1: atkbd0] 100038 I [irq28: xenpci0] 100037 I [irq15: ata1] 100036 I [irq14: ata0] 100032 I [swi2: cambio] 100030 I [swi5: fast taskq] 100028 I [swi6: Giant taskq] 100027 I [swi6: task queue] 100020 I [swi3: vm] 100019 I [swi1: netisr 0] 100018 I [swi4: clock] 100017 I [swi4: clock] 100016 I [swi4: clock] 100015 I [swi4: clock] 100014 I [swi4: clock] 100013 I [swi4: clock] 100012 I [swi4: clock] 100011 I [swi4: clock] 11 0 0 0 RL (threaded) [idle] 100010 Run CPU 7 [idle: cpu7] 100009 Run CPU 6 [idle: cpu6] 100008 Run CPU 5 [idle: cpu5] 100007 Run CPU 4 [idle: cpu4] 100006 Run CPU 3 [idle: cpu3] 100005 Run CPU 2 [idle: cpu2] 100004 CanRun [idle: cpu1] 100003 Run CPU 0 [idle: cpu0] 1 0 1 0 SLs wait 0xfffffe000bc86970 [init] 10 0 0 0 DL audit_wo 0xffffffff8159b860 [audit] 0 0 0 0 DLs (threaded) [kernel] 100046 D - 0xffffffff81343440 [deadlkres] 100045 D - 0xfffffe000bcbd600 [mca taskq] 100035 D - 0xfffffe000bcbd700 [acpi_task_2] 100034 D - 0xfffffe000bcbd700 [acpi_task_1] 100033 D - 0xfffffe000bcbd700 [acpi_task_0] 100031 D - 0xfffffe000bcbd900 [kqueue taskq] 100029 D - 0xfffffe000bcbdb00 [thread taskq] 100026 D - 0xfffffe000bcbde00 [ffs_trim taskq] 100024 D - 0xfffffe000bcbe000 [firmware taskq] 100000 D sched 0xffffffff8158cb48 [swapper] db> tr 17 Tracing pid 17 tid 100055 td 0xfffffe000d14d000 cpustop_handler() at cpustop_handler+0x28/frame 0xffffff8000232cf0 ipi_nmi_handler() at ipi_nmi_handler+0x3f/frame 0xffffff8000232d10 trap() at trap+0x42/frame 0xffffff8000232f20 nmi_calltrap() at nmi_calltrap+0x8/frame 0xffffff8000232f20 --- trap 0x13, rip = 0xffffffff808b4bd6, rsp = 0xffffff8000232fe0, rbp = 0xffffff8120c569a0 --- _mtx_trylock_flags_() at _mtx_trylock_flags_+0x106/frame 0xffffff8120c569a0 mnt_vnode_next_active() at mnt_vnode_next_active+0x141/frame 0xffffff8120c569e0 ffs_sync() at ffs_sync+0xb5/frame 0xffffff8120c56ab0 sync_fsync() at sync_fsync+0x12d/frame 0xffffff8120c56ae0 VOP_FSYNC_APV() at VOP_FSYNC_APV+0xf0/frame 0xffffff8120c56b10 sched_sync() at sched_sync+0x34c/frame 0xffffff8120c56bb0 fork_exit() at fork_exit+0x84/frame 0xffffff8120c56bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xffffff8120c56bf0 --- trap 0, rip = 0, rsp = 0xffffff8120c56cb0, rbp = 0 --- The VM can be stuck in this state for quite some time, it generally varies between a couple of minutes (5-10min) to an hour or two, after this the VM recovers itself and resumes normal operation. I still have to test this on a bare metal FreeBSD install, but I would like to ask if someone has seen a similar behaviour, or if someone is suspicious of a change that could cause this. Thanks, Roger.