Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Apr 2001 11:40:11 -0400
From:      "David E. Cross" <crossd@cs.rpi.edu>
To:        freebsd-hackers@freebsd.org
Subject:   sigh... ypserv bug still very much alive
Message-ID:  <200104091540.LAA52639@cs.rpi.edu>

next in thread | raw e-mail | index | archive | help
The ypserv bug (the one where ypserv randomly stops responding or
just seg-faults) is still very much alive.  I had to restart it
about 11 times in the course of 20 minutes this morning.  That's
the bad news, the good news is that I started it each time with
'ktrace -i'.  

Going back a bit, Matt Dillon suggested that the problem may have been
in the signal handler for sigchld.  I looked at the signal handler and 
it does not appear to be doing anything dangerous at all (just a
child_count--;)  is it doing something dangerous that I am just not seeing?

Also, in the last 200 lines of kdump output for each and every crash there
is the sequence of calls "select();  gettimeofday();"... that sequence of
calls never appears in the ypserv source code, but does appear in svc_tcp.c
in librpc... my question is: "ypserv defines its own svc_run, and for
TCP connections specifically handles things itself very carefully, how is
the svc_tcp.c code getting called at all?"  I think the answer to that is
the source of the problem (it should also be noted that in the case where
ypserv hasn't died and I have collected ktrace information -- up to 8 gig
of it -- the "select(); gettimeofday();" sequence is _never_ called.)

One of my ktrace-s is _very_ small, only 330K, from fork()/exec() to 
SIG_DFL/SEGV, so I am hoping this will provide easily digestible information.
I did not include context-switch information in the ktrace for the following
reasons:
  1) It didn't appear to be usefull, and since I did specify the -i, it is 
     obvious where context switches occur (to the only thing that could affect
     anything: the children)
  2) It caused ypserv to act strangely... instead of dying, it just got
     very slow, and didn't respond.

Anyone interested in helping me track this one down?

--
David Cross                               | email: crossd@cs.rpi.edu 
Lab Director                              | Rm: 308 Lally Hall
Rensselaer Polytechnic Institute,         | Ph: 518.276.2860            
Department of Computer Science            | Fax: 518.276.4033
I speak only for myself.                  | WinNT:Linux::Linux:FreeBSD

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104091540.LAA52639>