Date: Fri, 15 Feb 2013 08:44:43 -0500 From: John Baldwin <jhb@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Marc Fournier <scrappy@hub.org>, Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org Subject: Re: 9-STABLE -> NFS -> NetAPP: Message-ID: <201302150844.43188.jhb@freebsd.org> In-Reply-To: <1964289267.3041689.1360897556427.JavaMail.root@erie.cs.uoguelph.ca> References: <1964289267.3041689.1360897556427.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote: > Marc Fournier wrote: > > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > >=20 > > >> > > > The pid that is in "T" state for the "ps auxlH". > >=20 > > Different server, last kernel update on Jan 22nd, https process this > > time instead of du last time. > >=20 > > I've attached: > >=20 > > ps auxlH > > ps auxlH of just the processes that are in TJ state (6 httpd servers) > > procstat output for each of the 6 process > >=20 > >=20 > >=20 > >=20 > > They are included as attachments =E2=80=A6 if these don't make it throu= gh, let > > me know, just figured I'd try and keep it compact ... > Well, I've looked at this call path a little closer: > 16693 104135 httpd - mi_switch+0x186=20 thread_suspend_check+0x19f sleepq_catch_signals+0x1c5 > sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763=20 clnt_reconnect_call+0xfb newnfs_request+0xadb > nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56=20 nfs_access+0x306 vn_open_cred+0x5a8 > kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7=20 >=20 > I am probably way off, since I am not familiar with this stuff, but it > seems to me that thread_suspend_check() should just return 0 for the > case where stop_allowed =3D=3D SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set) > instead of sitting in the loop and doing a mi_switch(). I'm not even > sure if it should call thread_suspend_check() for this case, but there > are cases in thread_suspend_check() that I don't understand. >=20 > Although I don't really understand thread_suspend_check(), I've attached > a simple patch that might be a starting point for fixing this? >=20 > I wouldn't recommend trying the patch until kib and/or jhb weigh in > on whether it makes any sense. I think this is the right idea, but in HEAD with the sigdeferstop() changes= it=20 should just check for TDF_SBDRY instead of adding a new parameter. I think checking for TDF_SBDRY will work even in 9 (and will make the patch smaller= ). =20 Also, I think this is only needed for stop signals. Other suspend requests= =20 will eventually resume the thread, it is only stop signals that can cause t= he=20 thread to get stuck indefinitely (since it depends on the user sending=20 SIGCONT). Marc, are you using SIGSTOP? Index: kern_thread.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =2D-- kern_thread.c (revision 246122) +++ kern_thread.c (working copy) @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead) return (ERESTART); =20 /* + * Ignore suspend requests for stop signals if they + * are deferred. + */ + if (P_SHOULDSTOP(p) =3D=3D P_STOPPED_SIG && + td->td_flags & TDF_SBDRY) { + KASSERT(return_instead, + ("TDF_SBDRY set for unsafe thread_suspend_check")); + return (0); + } + + /* * If the process is waiting for us to exit, * this thread should just suicide. * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE. =2D-=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201302150844.43188.jhb>