Date: Fri, 15 Feb 2013 10:26:41 -0500 From: John Baldwin <jhb@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Konstantin Belousov <kostikbel@gmail.com>, Marc Fournier <scrappy@hub.org>, freebsd-stable@freebsd.org Subject: Re: 9-STABLE -> NFS -> NetAPP: Message-ID: <201302151026.41638.jhb@freebsd.org> In-Reply-To: <860054725.3050415.1360941671689.JavaMail.root@erie.cs.uoguelph.ca> References: <860054725.3050415.1360941671689.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, February 15, 2013 10:21:11 am Rick Macklem wrote: > Konstantin Belousov wrote: > > On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote: > > > On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote: > > > > Marc Fournier wrote: > > > > > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem@uoguelph.ca> > > > > > wrote: > > > > > > > > > > >> > > > > > > The pid that is in "T" state for the "ps auxlH". > > > > > > > > > > Different server, last kernel update on Jan 22nd, https process > > > > > this > > > > > time instead of du last time. > > > > > > > > > > I've attached: > > > > > > > > > > ps auxlH > > > > > ps auxlH of just the processes that are in TJ state (6 httpd > > > > > servers) > > > > > procstat output for each of the 6 process > > > > > > > > > > > > > > > > > > > > > > > > > They are included as attachments ??? if these don't make it > > > > > through, let > > > > > me know, just figured I'd try and keep it compact ... > > > > Well, I've looked at this call path a little closer: > > > > 16693 104135 httpd - mi_switch+0x186 > > > thread_suspend_check+0x19f sleepq_catch_signals+0x1c5 > > > > sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 > > > clnt_reconnect_call+0xfb newnfs_request+0xadb > > > > nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56 > > > nfs_access+0x306 vn_open_cred+0x5a8 > > > > kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 > > > > > > > > I am probably way off, since I am not familiar with this stuff, > > > > but it > > > > seems to me that thread_suspend_check() should just return 0 for > > > > the > > > > case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag > > > > set) > > > > instead of sitting in the loop and doing a mi_switch(). I'm not > > > > even > > > > sure if it should call thread_suspend_check() for this case, but > > > > there > > > > are cases in thread_suspend_check() that I don't understand. > > > > > > > > Although I don't really understand thread_suspend_check(), I've > > > > attached > > > > a simple patch that might be a starting point for fixing this? > > > > > > > > I wouldn't recommend trying the patch until kib and/or jhb weigh > > > > in > > > > on whether it makes any sense. > > > > > > I think this is the right idea, but in HEAD with the sigdeferstop() > > > changes it > > > should just check for TDF_SBDRY instead of adding a new parameter. I > > > think > > > checking for TDF_SBDRY will work even in 9 (and will make the patch > > > smaller). > > > Also, I think this is only needed for stop signals. Other suspend > > > requests > > > will eventually resume the thread, it is only stop signals that can > > > cause the > > > thread to get stuck indefinitely (since it depends on the user > > > sending > > > SIGCONT). > > > > > > Marc, are you using SIGSTOP? > > > > > > Index: kern_thread.c > > > =================================================================== > > > --- kern_thread.c (revision 246122) > > > +++ kern_thread.c (working copy) > > > @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead) > > > return (ERESTART); > > > > > > /* > > > + * Ignore suspend requests for stop signals if they > > > + * are deferred. > > > + */ > > > + if (P_SHOULDSTOP(p) == P_STOPPED_SIG && > > > + td->td_flags & TDF_SBDRY) { > > > + KASSERT(return_instead, > > > + ("TDF_SBDRY set for unsafe thread_suspend_check")); > > > + return (0); > > > + } > > > + > > > + /* > > > * If the process is waiting for us to exit, > > > * this thread should just suicide. > > > * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE. > > > > This looks correct. > Righto. Thanks jhb and kib for looking at this. > > Btw John, PBDRY still gets set for sleeps in the sys/rpc code. However, > as far as I can tell, it just sets TDF_SBDRY when it is already set > and seems harmless. (Since this code is supposed to be generic and not > specific to NFS, maybe it should stay that way?) In HEAD PBDRY is now a nop and the existing sigdeferstop() stuff should cover the calls in sys/rpc. > Also, since PBDRY on the sleeps sets TDF_SBDRY, I think the above patch > is ok for stable/9 without your recent head patch. Yep, exactly. > Thanks everyone for your help, rick Thanks for your debugging! -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201302151026.41638.jhb>