Date: Tue, 27 Mar 2012 17:50:17 GMT From: Konstantin Belousov <kostikbel@gmail.com> To: freebsd-bugs@FreeBSD.org Subject: Re: misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan) Message-ID: <201203271750.q2RHoHZB064657@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/166340; it has been noted by GNATS. From: Konstantin Belousov <kostikbel@gmail.com> To: Christian Esken <Christian.Esken@trivago.com> Cc: bug-followup@freebsd.org, avg@freebsd.org Subject: Re: misc/166340: Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan) Date: Tue, 27 Mar 2012 20:46:26 +0300 --KldKAdupQSLqpq2E Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 27, 2012 at 05:30:48PM +0200, Christian Esken wrote: > Konstantin Belousov wrote: > > Thank you for the data. Semi-obviously, the callout_stop() call in > > sleepq_check_timeout() have to return 0, otherwise we would not call > > mi_switch() there. But I do not see how this can happen, because > > the callout state, printed from kgdb, still indicates that callout > > is pending. Callout cannot be reset while in sleepq code. > >=20 > > So there are two possible routes to go forward: preferrable is for > > you to extract the self-contained C program that would illustrate > > the issue and send this sample to me. Second is to recompile your > > kernel with INVARIANTS/WITNESS and possibly KTR and see what happen. >=20 > I repeated the test with INVARIANTS/WITNESS and KTR compiled in > (actually WITNESS was already included during the last test). >=20 > I ran KTR with nothing filtered out, and formatted the dump with > "ktrdump -cftH -i ktr.out". The whole log is excessive (1GB), so > I have extrated two short sections (see attachment). >=20 > The first section shows the last action of the application, namely a > succselful sendto() to a TCP socket, and then waiting for an answer via > recvfrom(). > The second section illustrates the lock/unlock sequence of the sleep > mutex for the recfrom(). It goes like LOCK, LOCK, UNLOCK. >=20 > This time the signal status is different. We have a pending signal: > USER PID PPID PENDING CAUGHT IGNORED BLOCKED STAT WCHAN > nobody 9163 1 4000 80005006 79f88010 0 D - =20 >=20 > Looks like SIGPROF (27). Just wondering where it comes from. >=20 This is irrelevant, and probably red-herring. The issue there is failing callout_stop() while callout seems to be still pending. Also, mask 0x4000 of the pending signals indicates that SIGTERM is pending, not SIGPROF. I probably want the data from your ktr dump, either all entries for the stuck process and all entries for facility CALLOUT, or just the whole dump. Last entries of your log shred do not make much sense, since the process must enter _sleep() function which logs this fact right after locking sleepq. But log ends on so_rcv mutex lock. Please, when collecting the data, collect the whole set, i.e. include procstat -kk <pid> output together with the ktr, as well as kgdb output, so that I can be sure that we chasing one, and not N bugs. --KldKAdupQSLqpq2E Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9x/PIACgkQC3+MBN1Mb4hbeACfYyUTEE5GV/SeDO4fNf4ErfHY 27oAoIGj2TMOBtQRi5P+q/v+nrKOFhFb =0tFs -----END PGP SIGNATURE----- --KldKAdupQSLqpq2E--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203271750.q2RHoHZB064657>