From owner-freebsd-current@FreeBSD.ORG  Sun Jul 31 19:03:22 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 94BED106564A;
	Sun, 31 Jul 2011 19:03:22 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 381DD8FC0C;
	Sun, 31 Jul 2011 19:03:21 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EAFymNU6DaFvO/2dsb2JhbABAhEekDoFAAQEFIwRSGw4KAgINGQJZBrYxkA+BK4QHgRAEknuRAA
X-IronPort-AV: E=Sophos;i="4.67,296,1309752000"; d="scan'208";a="132942797"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 31 Jul 2011 15:03:21 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 25954B3F5A;
	Sun, 31 Jul 2011 15:03:21 -0400 (EDT)
Date: Sun, 31 Jul 2011 15:03:21 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <1368975528.1226755.1312139001048.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4E3505FA.6000403@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FreeBSD-Current <freebsd-current@FreeBSD.org>
Subject: Re: Heavy I/O blocks FreeBSD box for several seconds
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 31 Jul 2011 19:03:22 -0000

Andriy Gapon wrote:
> on 31/07/2011 01:55 Rick Macklem said the following:
> > Andriv Gapon wrote:
> >> on 26/07/2011 00:44 Rick Macklem said the following:
> >>> hrs sent me this panic. I'm wondering if it might be relevant to
> >>> this?
> >>
> >> To 'this' what?
> >
> > Well, my thinking was (and it is quite likely completely incorrect)
> > is
> > that, since this panic is a result of holding the spin lock too
> > long,
> > that it might be another symptom of the same problem as when the
> > scheduler
> > seems to "lock up" for several seconds.
> 
> I don't think that there is any issue with the scheduler "locking up"
> or taking
> too long to make scheduling decisions, at least when we talk about
> performance
> problems. There can be issues with the scheduler making sub-optimal
> scheduling
> decisions.
> 
> > --> The bug might be some case where the spin lock isn't being
> > released
> >     when it should be, and that "could" result in "Heavy I/O blocks
> >     FreeBSD box for several seconds". Note that the server was under
> >     heavy I/O load when this panic occurred.
> >
> >
> >> A panic is a panic.
> >>
> >>> spin lock 0xffffffff80cb52c0 (sched lock 1) held by
> >>> 0xffffff0012c7f8c0 (tid 100317) too long
> >>> panic: spin lock held too long
> >>
> >> I guess it's more related to the thread on stable@ which has the
> >> panic
> >> message
> >> as its subject.
> >>
> >>> cpuid = 0
> >>> KDB: stack backtrace:
> >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> >>> kdb_backtrace() at kdb_backtrace+0x37
> >>> panic() at panic+0x187
> >>> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> >>> _mtx_lock_spin() at _mtx_lock_spin+0x9e
> >>> sched_add() at sched_add+0x117
> >>> setrunnable() at setrunnable+0x78
> >>> sleepq_signal() at sleepq_signal+0x7a
> >>> cv_signal() at cv_signal+0x3b
> >>> xprt_active() at xprt_active+0xe3
> >>> svc_vc_soupcall() at svc_vc_soupcall+0xc
> >>> sowakeup() at sowakeup+0x69
> >>> tcp_do_segment() at tcp_do_segment+0x2cbd
> >>> tcp_input() at tcp_input+0xcdd
> >>> ip_input() at ip_input+0xac
> >>> netisr_dispatch_src() at netisr_dispatch_src+0x7e
> >>> ether_demux() at ether_demux+0x14d
> >>> ether_input() at ether_input+0x17d
> >>> em_rxeof() at em_rxeof+0x1ca
> >>> em_handle_que() at em_handle_que+0x5b
> >>> taskqueue_run_locked() at taskqueue_run_locked+0x85
> >>> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> >>> fork_exit() at fork_exit+0x11f
> >>> fork_trampoline() at fork_trampoline+0xe
> >>> --- trap 0, rip = 0, rsp = 0xffffff8000160d00, rbp = 0 ---
> >>> KDB: enter: panic
> >>> [thread pid 0 tid 100033 ]
> >>> Stopped at kdb_enter+0x3b: movq $0,0x6b4e62(%rip)
> >>> db> ps
> >>>
> >>
> >> --
> >> Andriy Gapon
> >>
Ok, so if the scheduler spin lock is being held too long, what could
cause this, if it isn't a scheduler bug?

I can't think of how NFS would affect this beyond causing a heavy I/O
load, but if there is some way it does, I suppose I need to know what
that is?

rick