Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Feb 2013 08:44:43 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Marc Fournier <scrappy@hub.org>, Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org
Subject:   Re: 9-STABLE -> NFS -> NetAPP:
Message-ID:  <201302150844.43188.jhb@freebsd.org>
In-Reply-To: <1964289267.3041689.1360897556427.JavaMail.root@erie.cs.uoguelph.ca>
References:  <1964289267.3041689.1360897556427.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote:
> Marc Fournier wrote:
> > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> >=20
> > >>
> > > The pid that is in "T" state for the "ps auxlH".
> >=20
> > Different server, last kernel update on Jan 22nd, https process this
> > time instead of du last time.
> >=20
> > I've attached:
> >=20
> > ps auxlH
> > ps auxlH of just the processes that are in TJ state (6 httpd servers)
> > procstat output for each of the 6 process
> >=20
> >=20
> >=20
> >=20
> > They are included as attachments =E2=80=A6 if these don't make it throu=
gh, let
> > me know, just figured I'd try and keep it compact ...
> Well, I've looked at this call path a little closer:
> 16693 104135 httpd            -                mi_switch+0x186=20
thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
>   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763=20
clnt_reconnect_call+0xfb newnfs_request+0xadb
>   nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56=20
nfs_access+0x306 vn_open_cred+0x5a8
>   kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7=20
>=20
> I am probably way off, since I am not familiar with this stuff, but it
> seems to me that thread_suspend_check() should just return 0 for the
> case where stop_allowed =3D=3D SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set)
> instead of sitting in the loop and doing a mi_switch(). I'm not even
> sure if it should call thread_suspend_check() for this case, but there
> are cases in thread_suspend_check() that I don't understand.
>=20
> Although I don't really understand thread_suspend_check(), I've attached
> a simple patch that might be a starting point for fixing this?
>=20
> I wouldn't recommend trying the patch until kib and/or jhb weigh in
> on whether it makes any sense.

I think this is the right idea, but in HEAD with the sigdeferstop() changes=
 it=20
should just check for TDF_SBDRY instead of adding a new parameter.  I think
checking for TDF_SBDRY will work even in 9 (and will make the patch smaller=
). =20
Also, I think this is only needed for stop signals.  Other suspend requests=
=20
will eventually resume the thread, it is only stop signals that can cause t=
he=20
thread to get stuck indefinitely (since it depends on the user sending=20
SIGCONT).

Marc, are you using SIGSTOP?

Index: kern_thread.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=2D-- kern_thread.c	(revision 246122)
+++ kern_thread.c	(working copy)
@@ -795,6 +795,17 @@ thread_suspend_check(int return_instead)
 			return (ERESTART);
=20
 		/*
+		 * Ignore suspend requests for stop signals if they
+		 * are deferred.
+		 */
+		if (P_SHOULDSTOP(p) =3D=3D P_STOPPED_SIG &&
+		    td->td_flags & TDF_SBDRY) {
+			KASSERT(return_instead,
+			    ("TDF_SBDRY set for unsafe thread_suspend_check"));
+			return (0);
+		}
+
+		/*
 		 * If the process is waiting for us to exit,
 		 * this thread should just suicide.
 		 * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.

=2D-=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201302150844.43188.jhb>