Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Dec 2012 06:52:25 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Tim Kientzle <kientzle@freebsd.org>, freebsd-current Current <freebsd-current@freebsd.org>
Subject:   Re: r244036 kernel hangs under load.
Message-ID:  <20121211045225.GY3013@kib.kiev.ua>
In-Reply-To: <703324201.1302912.1355184719580.JavaMail.root@erie.cs.uoguelph.ca>
References:  <20121210184545.GS3013@kib.kiev.ua> <703324201.1302912.1355184719580.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

--7zXkBnrtCPTreBSJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote:
> > > Adrian Chadd wrote:
> > > > .. what was the previous kernel version?
> > > >
> > > Hopefully Tim has it narrowed down more, but I don't see
> > > the hangs on a Sept. 7 kernel from head and I do see them
> > > on a Dec. 3 kernel from head. (Don't know the eact rNNNNNN.)
> > >
> > > It seems to predate my commit (r244008), which was my first
> > > concern.
> > >
> > > I use old single core i386 hardware and can fairly reliably
> > > reproduce it by doing a kernel build and a "svn checkout"
> > > concurrently. No NFS activity. These are running on a local
> > > disk (UFS/FFS). (The kernel I reproduce it on is built via
> > > GENERIC for i386. If you want me to start a "binary search"
> > > for which rNNNNNN, I can do that, but it will take a while.:-)
> > >
> > > I can get out into DDB, but I'll admit I don't know enough
> > > about it to know where to look;-)
> > > Here's some lines from "db> ps", in case they give someone
> > > useful information. (I can leave this box sitting in DB for
> > > the rest of to-day, in case someone can suggest what I should
> > > look for on it.)
> > >
> > > Just snippets...
> > >    Ss pause adjkerntz
> > >    DL sdflush [sofdepflush]
> > >    RL [syncer]
> > >    DL vlruwt [vnlru]
> > >    DL psleep [bufdaemon]
> > >    RL [pagezero]
> > >    DL psleep [vmdaemon]
> > >    DL psleep [pagedaemon]
> > >    DL ccb_scan [xpt_thrd]
> > >    DL waiting_ [sctp_iterator]
> > >    DL ctl_work [ctl_thrd]
> > >    DL cooling [acpi_cooling0]
> > >    DL tzpoll [acpi_thermal]
> > >    DL (threaded) [usb]
> > >    ...
> > >    DL - [yarrow]
> > >    DL (threaded) [geom]
> > >    D - [g_down]
> > >    D - [g_up]
> > >    D - [g_event]
> > >    RL (threaded) [intr]
> > >    I [irq15: ata1]
> > >    ...
> > >    Run CPU0 [swi6: Giant taskq]
> > > --> does this one indicate the CPU is actually running this?
> > >    (after a db> cont, wait a while <ctrl><alt><esc> db> ps
> > >     it is still the same)
> > >    I [swi4: clock]
> > >    I [swi1: netisr 0]
> > >    I [swi3: vm]
> > >    RL [idle: cpu0]
> > >    SLs wait [init]
> > >    DL audit_wo [audit]
> > >    DLs (threaded) [kernel]
> > >    D - [deadlkres]
> > >    ...
> > >    D sched [swapper]
> > >
> > > I have no idea if this "ps" output helps, unless it indicates
> > > that it is looping on the Giant taskq?
> > Might be. You could do 'bt <pid>' for the process to see where it
> > loops.
> > Another good set of hints is at
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/ke=
rneldebug-deadlocks.html
>=20
> Kostik, you must be clairvoyant;-)
>=20
> When I did "show alllocks", I found that the syncer process held
> - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720
> - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780
> The trace for this process goes like:
>  spinlock_exit
>  mtx_unlock_spin_flags
>  kern_yield
>  _mnt_vnode_next_active
>  vnode_next_active
>  vfs_msync()
>=20
> So, it seems like your r244095 commit might have fixed this?
> (I'm not good at this stuff, but from your description, it looks
>  like it did the kern_yield() with the mutex held and "maybe"
>  got into trouble trying to acquire Giant?)
>=20
> Anyhow, I'm going to test a kernel with r244095 in it and see
> if I can still reproduce the hang.
> (There wasn't much else in the "show alllocks", except a
>  process that held the exclusive vnode interlock mutex plus
>  a ufs vnode lock, but it's just doing a witness_unlock.)
There must be a thread blocked for the mount interlock for the loop
in the mnt_vnode_next_active to cause livelock.

>=20
> I'll email if/when I know more, rick
> ps: Fingers/toes crossed that you've already fixed it.
>=20
> >=20
> > >
> > > As I said, I can leave it in "db" for to-day, if anyone wants
> > > me to do anything in the debugger and I can probably reproduce
> > > it, if someone wants stuff tried later.
> > >
> > > rick
> > >
> > >
> > > >
> > > >
> > > > adrian
> > > >
> > > >
> > > > On 9 December 2012 22:08, Tim Kientzle <kientzle@freebsd.org>
> > > > wrote:
> > > > > I haven't found any useful clues yet, but thought I'd ask if
> > > > > anyone
> > > > > else
> > > > > was seeing hangs in a recent kernel.
> > > > >
> > > > > I just upgraded to r244036 using a straight GENERIC i386 kernel.
> > > > > (Straight buildworld/buildkernel, no local changes,
> > > > > /etc/src.conf
> > > > > doesn't
> > > > > exist, /etc/make.conf just has PERL_VERSION defined.)
> > > > >
> > > > > When I try to cross build an ARM world on the resulting system,
> > > > > the entire system hangs hard after about 30 minutes: No network,
> > > > > no keyboard response, no nothing.
> > > > >
> > > > > Don't know if it's relevant, but the system is using NFS pretty
> > > > > heavily (Parallels VM mounting NFS from Mac OS 10.7 host.)
> > > > >
> > > > > I'll try to get some more details ...
> > > > >
> > > > > Tim
> > > > >
> > > > > _______________________________________________
> > > > > freebsd-current@freebsd.org mailing list
> > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > > To unsubscribe, send any mail to
> > > > > "freebsd-current-unsubscribe@freebsd.org"
> > > > _______________________________________________
> > > > freebsd-current@freebsd.org mailing list
> > > > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > To unsubscribe, send any mail to
> > > > "freebsd-current-unsubscribe@freebsd.org"
> > > _______________________________________________
> > > freebsd-current@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to
> > > "freebsd-current-unsubscribe@freebsd.org"

--7zXkBnrtCPTreBSJ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQxrwJAAoJEJDCuSvBvK1Bpg0P/2sogQ9eqwfBcdKDLLxGv4ZO
ea2nnp4ovJOGDhSowFS69sYqeT7sXvI3Oixqd/6X8bCkUSySKs4q4np0k+ds+49S
f89mro0Y4dnOoGm1tq/cRdu3YnM00OaDzSoeMcbHGHs72eC6eBIRm56PZc+8BO1p
moqUDfsY6zlhCcF3wmqVDLwhnHBnMitTaNz2w/DDrYpksf9YIwlxntzENMUPTM0l
dc3JZ6qFgkc/YpnO5tkTr2zAo1lGSsZiiHH46WCt0bj8U6XeGX8hhKp0RS6EynKO
1tRCvwZ49YIyzBXfldj5EW5JA+hegnt9CWwwW/2ViTXaOVG4T5tD84k33tuCTdoQ
AGT6BTOkYpSiUUrVIFwPUelCax+X8F+rKsW4WbC8lhjUD5K02+jfigfW7pNh+/37
aenQlGi0bD+lkrtZyfizHd3uJI7TjHZgAgvuJCutCFwEDRjipTU8O8ta7PUABdFc
o4OeFBpK649ZDQ/Rt2sKcwMIc9vC8h5kneuYPmLKxPw95g/LKUmOEPY0SyS0K6WJ
QYGq6uWr/NZZGJgAen5SugZeRE+tFdk+xGLoXz0pJZu4lYDS2bER1KjoTqn1sHBN
rqUg2ser2Y1vl9Uf/IUBXPDCWFMqfvmnlClR+pEX+8jVgjYnLBGh/7etYeoMjhxy
Y9JEnjI/YMjweXbXqhNo
=kKhv
-----END PGP SIGNATURE-----

--7zXkBnrtCPTreBSJ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121211045225.GY3013>