Date: Tue, 11 Dec 2012 06:52:25 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Tim Kientzle <kientzle@freebsd.org>, freebsd-current Current <freebsd-current@freebsd.org> Subject: Re: r244036 kernel hangs under load. Message-ID: <20121211045225.GY3013@kib.kiev.ua> In-Reply-To: <703324201.1302912.1355184719580.JavaMail.root@erie.cs.uoguelph.ca> References: <20121210184545.GS3013@kib.kiev.ua> <703324201.1302912.1355184719580.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
--7zXkBnrtCPTreBSJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 10, 2012 at 07:11:59PM -0500, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Mon, Dec 10, 2012 at 01:38:21PM -0500, Rick Macklem wrote: > > > Adrian Chadd wrote: > > > > .. what was the previous kernel version? > > > > > > > Hopefully Tim has it narrowed down more, but I don't see > > > the hangs on a Sept. 7 kernel from head and I do see them > > > on a Dec. 3 kernel from head. (Don't know the eact rNNNNNN.) > > > > > > It seems to predate my commit (r244008), which was my first > > > concern. > > > > > > I use old single core i386 hardware and can fairly reliably > > > reproduce it by doing a kernel build and a "svn checkout" > > > concurrently. No NFS activity. These are running on a local > > > disk (UFS/FFS). (The kernel I reproduce it on is built via > > > GENERIC for i386. If you want me to start a "binary search" > > > for which rNNNNNN, I can do that, but it will take a while.:-) > > > > > > I can get out into DDB, but I'll admit I don't know enough > > > about it to know where to look;-) > > > Here's some lines from "db> ps", in case they give someone > > > useful information. (I can leave this box sitting in DB for > > > the rest of to-day, in case someone can suggest what I should > > > look for on it.) > > > > > > Just snippets... > > > Ss pause adjkerntz > > > DL sdflush [sofdepflush] > > > RL [syncer] > > > DL vlruwt [vnlru] > > > DL psleep [bufdaemon] > > > RL [pagezero] > > > DL psleep [vmdaemon] > > > DL psleep [pagedaemon] > > > DL ccb_scan [xpt_thrd] > > > DL waiting_ [sctp_iterator] > > > DL ctl_work [ctl_thrd] > > > DL cooling [acpi_cooling0] > > > DL tzpoll [acpi_thermal] > > > DL (threaded) [usb] > > > ... > > > DL - [yarrow] > > > DL (threaded) [geom] > > > D - [g_down] > > > D - [g_up] > > > D - [g_event] > > > RL (threaded) [intr] > > > I [irq15: ata1] > > > ... > > > Run CPU0 [swi6: Giant taskq] > > > --> does this one indicate the CPU is actually running this? > > > (after a db> cont, wait a while <ctrl><alt><esc> db> ps > > > it is still the same) > > > I [swi4: clock] > > > I [swi1: netisr 0] > > > I [swi3: vm] > > > RL [idle: cpu0] > > > SLs wait [init] > > > DL audit_wo [audit] > > > DLs (threaded) [kernel] > > > D - [deadlkres] > > > ... > > > D sched [swapper] > > > > > > I have no idea if this "ps" output helps, unless it indicates > > > that it is looping on the Giant taskq? > > Might be. You could do 'bt <pid>' for the process to see where it > > loops. > > Another good set of hints is at > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/ke= rneldebug-deadlocks.html >=20 > Kostik, you must be clairvoyant;-) >=20 > When I did "show alllocks", I found that the syncer process held > - exclusive sleep mutex mount mtx locked @ kern/vfs_subr.c:4720 > - exclusive lockmgr syncer locked @ kern/vfs_subr.c:1780 > The trace for this process goes like: > spinlock_exit > mtx_unlock_spin_flags > kern_yield > _mnt_vnode_next_active > vnode_next_active > vfs_msync() >=20 > So, it seems like your r244095 commit might have fixed this? > (I'm not good at this stuff, but from your description, it looks > like it did the kern_yield() with the mutex held and "maybe" > got into trouble trying to acquire Giant?) >=20 > Anyhow, I'm going to test a kernel with r244095 in it and see > if I can still reproduce the hang. > (There wasn't much else in the "show alllocks", except a > process that held the exclusive vnode interlock mutex plus > a ufs vnode lock, but it's just doing a witness_unlock.) There must be a thread blocked for the mount interlock for the loop in the mnt_vnode_next_active to cause livelock. >=20 > I'll email if/when I know more, rick > ps: Fingers/toes crossed that you've already fixed it. >=20 > >=20 > > > > > > As I said, I can leave it in "db" for to-day, if anyone wants > > > me to do anything in the debugger and I can probably reproduce > > > it, if someone wants stuff tried later. > > > > > > rick > > > > > > > > > > > > > > > > > > adrian > > > > > > > > > > > > On 9 December 2012 22:08, Tim Kientzle <kientzle@freebsd.org> > > > > wrote: > > > > > I haven't found any useful clues yet, but thought I'd ask if > > > > > anyone > > > > > else > > > > > was seeing hangs in a recent kernel. > > > > > > > > > > I just upgraded to r244036 using a straight GENERIC i386 kernel. > > > > > (Straight buildworld/buildkernel, no local changes, > > > > > /etc/src.conf > > > > > doesn't > > > > > exist, /etc/make.conf just has PERL_VERSION defined.) > > > > > > > > > > When I try to cross build an ARM world on the resulting system, > > > > > the entire system hangs hard after about 30 minutes: No network, > > > > > no keyboard response, no nothing. > > > > > > > > > > Don't know if it's relevant, but the system is using NFS pretty > > > > > heavily (Parallels VM mounting NFS from Mac OS 10.7 host.) > > > > > > > > > > I'll try to get some more details ... > > > > > > > > > > Tim > > > > > > > > > > _______________________________________________ > > > > > freebsd-current@freebsd.org mailing list > > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > > > > To unsubscribe, send any mail to > > > > > "freebsd-current-unsubscribe@freebsd.org" > > > > _______________________________________________ > > > > freebsd-current@freebsd.org mailing list > > > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > > > To unsubscribe, send any mail to > > > > "freebsd-current-unsubscribe@freebsd.org" > > > _______________________________________________ > > > freebsd-current@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > > To unsubscribe, send any mail to > > > "freebsd-current-unsubscribe@freebsd.org" --7zXkBnrtCPTreBSJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQxrwJAAoJEJDCuSvBvK1Bpg0P/2sogQ9eqwfBcdKDLLxGv4ZO ea2nnp4ovJOGDhSowFS69sYqeT7sXvI3Oixqd/6X8bCkUSySKs4q4np0k+ds+49S f89mro0Y4dnOoGm1tq/cRdu3YnM00OaDzSoeMcbHGHs72eC6eBIRm56PZc+8BO1p moqUDfsY6zlhCcF3wmqVDLwhnHBnMitTaNz2w/DDrYpksf9YIwlxntzENMUPTM0l dc3JZ6qFgkc/YpnO5tkTr2zAo1lGSsZiiHH46WCt0bj8U6XeGX8hhKp0RS6EynKO 1tRCvwZ49YIyzBXfldj5EW5JA+hegnt9CWwwW/2ViTXaOVG4T5tD84k33tuCTdoQ AGT6BTOkYpSiUUrVIFwPUelCax+X8F+rKsW4WbC8lhjUD5K02+jfigfW7pNh+/37 aenQlGi0bD+lkrtZyfizHd3uJI7TjHZgAgvuJCutCFwEDRjipTU8O8ta7PUABdFc o4OeFBpK649ZDQ/Rt2sKcwMIc9vC8h5kneuYPmLKxPw95g/LKUmOEPY0SyS0K6WJ QYGq6uWr/NZZGJgAen5SugZeRE+tFdk+xGLoXz0pJZu4lYDS2bER1KjoTqn1sHBN rqUg2ser2Y1vl9Uf/IUBXPDCWFMqfvmnlClR+pEX+8jVgjYnLBGh/7etYeoMjhxy Y9JEnjI/YMjweXbXqhNo =kKhv -----END PGP SIGNATURE----- --7zXkBnrtCPTreBSJ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121211045225.GY3013>