Date: Sun, 31 Jul 2011 15:03:21 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Andriy Gapon <avg@FreeBSD.org> Cc: FreeBSD-Current <freebsd-current@FreeBSD.org> Subject: Re: Heavy I/O blocks FreeBSD box for several seconds Message-ID: <1368975528.1226755.1312139001048.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4E3505FA.6000403@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Andriy Gapon wrote: > on 31/07/2011 01:55 Rick Macklem said the following: > > Andriv Gapon wrote: > >> on 26/07/2011 00:44 Rick Macklem said the following: > >>> hrs sent me this panic. I'm wondering if it might be relevant to > >>> this? > >> > >> To 'this' what? > > > > Well, my thinking was (and it is quite likely completely incorrect) > > is > > that, since this panic is a result of holding the spin lock too > > long, > > that it might be another symptom of the same problem as when the > > scheduler > > seems to "lock up" for several seconds. > > I don't think that there is any issue with the scheduler "locking up" > or taking > too long to make scheduling decisions, at least when we talk about > performance > problems. There can be issues with the scheduler making sub-optimal > scheduling > decisions. > > > --> The bug might be some case where the spin lock isn't being > > released > > when it should be, and that "could" result in "Heavy I/O blocks > > FreeBSD box for several seconds". Note that the server was under > > heavy I/O load when this panic occurred. > > > > > >> A panic is a panic. > >> > >>> spin lock 0xffffffff80cb52c0 (sched lock 1) held by > >>> 0xffffff0012c7f8c0 (tid 100317) too long > >>> panic: spin lock held too long > >> > >> I guess it's more related to the thread on stable@ which has the > >> panic > >> message > >> as its subject. > >> > >>> cpuid = 0 > >>> KDB: stack backtrace: > >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > >>> kdb_backtrace() at kdb_backtrace+0x37 > >>> panic() at panic+0x187 > >>> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > >>> _mtx_lock_spin() at _mtx_lock_spin+0x9e > >>> sched_add() at sched_add+0x117 > >>> setrunnable() at setrunnable+0x78 > >>> sleepq_signal() at sleepq_signal+0x7a > >>> cv_signal() at cv_signal+0x3b > >>> xprt_active() at xprt_active+0xe3 > >>> svc_vc_soupcall() at svc_vc_soupcall+0xc > >>> sowakeup() at sowakeup+0x69 > >>> tcp_do_segment() at tcp_do_segment+0x2cbd > >>> tcp_input() at tcp_input+0xcdd > >>> ip_input() at ip_input+0xac > >>> netisr_dispatch_src() at netisr_dispatch_src+0x7e > >>> ether_demux() at ether_demux+0x14d > >>> ether_input() at ether_input+0x17d > >>> em_rxeof() at em_rxeof+0x1ca > >>> em_handle_que() at em_handle_que+0x5b > >>> taskqueue_run_locked() at taskqueue_run_locked+0x85 > >>> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > >>> fork_exit() at fork_exit+0x11f > >>> fork_trampoline() at fork_trampoline+0xe > >>> --- trap 0, rip = 0, rsp = 0xffffff8000160d00, rbp = 0 --- > >>> KDB: enter: panic > >>> [thread pid 0 tid 100033 ] > >>> Stopped at kdb_enter+0x3b: movq $0,0x6b4e62(%rip) > >>> db> ps > >>> > >> > >> -- > >> Andriy Gapon > >> Ok, so if the scheduler spin lock is being held too long, what could cause this, if it isn't a scheduler bug? I can't think of how NFS would affect this beyond causing a heavy I/O load, but if there is some way it does, I suppose I need to know what that is? rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1368975528.1226755.1312139001048.JavaMail.root>