Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Aug 2016 11:18:02 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Hooman Fazaeli <hoomanfazaeli@gmail.com>
Cc:        Ryan Stone <rysto32@gmail.com>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: 9.3-RELEASE panic: spin lock held too long
Message-ID:  <20160811081802.GF83214@kib.kiev.ua>
In-Reply-To: <57ABB512.4030503@gmail.com>
References:  <57AB349B.2010805@gmail.com> <20160810141948.GP83214@kib.kiev.ua> <57AB462A.2080608@gmail.com> <CAFMmRNw3hFWy0dqwvnQn4wdYdWvU=-73N4gYffvj2HGrvefk7Q@mail.gmail.com> <57AB632D.4000501@gmail.com> <CAFMmRNwKWkuJJ%2BU_xVgmrUweFbJkN7UN_U0HUR1aJWoNHx0WgQ@mail.gmail.com> <57ABB512.4030503@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 11, 2016 at 03:43:22AM +0430, Hooman Fazaeli wrote:
> On 2016-08-10 22:10, Ryan Stone wrote:
> > On Wed, Aug 10, 2016 at 1:23 PM, Hooman Fazaeli <hoomanfazaeli@gmail.com <mailto:hoomanfazaeli@gmail.com>> wrote:
> >
> >     No. I have panics involving 'turnstile lock' (see the original post) and 'sched lock 2' too.
> >
> >
> > That doesn't necessarily mean that the root cause isn't due to sched lock 0 being leaked.  You'd have to dig into the cores and look at the chain of dependent locks to be sure.  Give the patch a 
> > try; it should panic quite quickly if it's the issue I am thinking of.
> 
> Sure, I will.
> BTW, what do you exactly mean by lock leaking?
> 
> Is there a list for the possible causes of 'spin lock held too long' panics?
> I mean, what sorts of coding bugs may cause a thread to hold a spin lock for
> a long time? Such a list would provide me an starting point for diagnostics.
It is impossible to provide the complete list.

Possible causes are:
- already mentioned lock leak;
- lock recursion (sometimes);
- something which delays execution of the protected region, which takes the
  spinlock for otherwise legitimate reasons and period, eg.
	infinite or too aggressive looping, e.g. due to a deadlock
	with spinlocks;
	NMI with run-away handler;
	failed and stopped executing core;
	SMI or hypervisor taking control off the OS on the given CPU, while
	allowing other thread on other CPU to run and notice that.
and so on.

> 
> And, How much long is 'too long'? What is the justification behind
> the few million for() loop iterations that _mtx_lock_spin waits
> to grab a spin lock?
This is purely based on real-life experience on the hardware. If faster
CPUs with slower inter-core communication facilities ever appear, the
constant might need an adjustment. It is fine for currently fastest
hardware, and by design is ok for anything slower.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160811081802.GF83214>